Simple R.A.G. App
This guide walks you through building a minimal Retrieval-Augmented Generation (RAG) application from scratch using OpenCode as your AI coding agent and NaviGator Toolkit as the AI model provider.
What You'll Build
A simple RAG application that lets you:
- Upload
.txt,.md,.html,.csv, or.jsonfiles - Ask questions about those files in a chat interface
- Get answers grounded in the documents you uploaded
The finished app runs locally on port 8000 and uses a single SQLite file for storage — no external database required.
Prerequisites
- A NaviGator Toolkit API key (get one here)
- OpenCode installed and configured with your NaviGator Toolkit key (setup guide)
- Python 3.10+
- Access to the following models on your NaviGator Toolkit team:
nomic-embed-text-v1.5— used to embed document chunks and queriesgpt-oss-120b— used to generate answers from retrieved context
If your team does not have access to these models, submit a request through the UFIT Portal. You can swap in any available chat and embedding model by updating CHAT_MODEL and EMBEDDING_MODEL in your .env file.
Step 1 — Create a Project Directory
Create an empty directory for your project and open it in your terminal:
mkdir my-rag-app && cd my-rag-app
opencode
Step 2 — Give OpenCode the Specification
Once OpenCode is running, paste the following specification into the chat. OpenCode will generate all the files, install dependencies, and set up the project for you.
Build a minimal RAG (Retrieval-Augmented Generation) demo in Python with the following specification:
## Overview
Build a minimal end-to-end RAG demo. The app ingests text files, chunks and embeds the content,
stores embeddings locally in SQLite, and answers questions using retrieved context via a chat model.
## Project Structure
minimal-rag/
├── app.py
├── rag.py
├── db.py
├── index.html # UI — at root, served by app.py
├── tests/
│ ├── conftest.py
│ └── test_rag.py
├── start.sh
├── requirements.txt
├── .env.example
└── rag.db # created at runtime
## Dependencies (requirements.txt)
fastapi
uvicorn[standard]
python-multipart
litellm
python-dotenv
sqlite-vec
pytest
## Configuration (.env)
All config loaded from .env via python-dotenv.
| Variable | Default | Description |
|-------------------|--------------------------|--------------------------------|
| OPENAI_API_KEY | — | NaviGator Toolkit API key |
| OPENAI_API_BASE | — | https://api.ai.it.ufl.edu/v1 |
| EMBEDDING_MODEL | openai/nomic-embed-text-v1.5 | Embedding model name |
| CHAT_MODEL | openai/gpt-oss-120b | Chat model name (needs openai/ prefix for LiteLLM with custom base URL) |
| RAG_DB_PATH | rag.db | SQLite database file path |
## Technical Constraints
### sqlite-vec
- Import sqlite_vec unconditionally at the top of db.py — no try/except, no fallback.
- Call sqlite_vec.load(conn) unconditionally inside get_connection() — do not wrap in try/except.
- vec_chunks must be a sqlite-vec virtual table: USING vec0(embedding float[768])
- Serialize embeddings with sqlite_vec.serialize_float32(embedding) before inserting.
- Vector search uses WHERE embedding MATCH ? AND k = ? clause.
### LiteLLM
- litellm.embedding() call: use keyword argument input= (not text= or prompt=)
- litellm.embedding() response: access as response.data[0]["embedding"]
- litellm.completion() response: access as response.choices[0].message.content (attribute access, not dict)
### Project layout
- app.py, rag.py, db.py at project root, not in a package.
- Use direct imports: import rag, from db import ...
- Read RAG_DB_PATH inside get_connection() on every call (not at module level).
## Storage (db.py)
Schema — three tables:
- docs: one row per ingested document, with created_at timestamp
- chunks: text chunks linked to parent doc, with chunk index and text
- vec_chunks: sqlite-vec virtual table, one embedding per chunk, 768 dimensions
Public interface:
- get_connection(): opens DB connection, loads sqlite-vec, creates tables
- insert_doc(): inserts doc row, returns id
- insert_chunk(): inserts chunk row and its embedding into vector table
- search_chunks(): returns k nearest chunks to query embedding
## Core RAG Library (rag.py)
Load .env at top of file. All LiteLLM calls go here.
add_doc(text) -> int
- Rejects empty/whitespace-only input
- Splits on \n\n paragraph boundaries; further splits paragraphs >700 chars with sliding window
- Embeds each chunk via LiteLLM and stores in DB
- Returns new doc id
search(query, k=5) -> list[str]
- Embeds query and returns top-k matching chunk texts
- Returns empty list if no documents ingested
answer(question, k=5) -> str
- Retrieves relevant chunks via search()
- If none found, returns human-readable "no documents" message
- Builds context-grounded prompt and returns chat model's response
## API (app.py)
GET /
- Serves index.html from project root
POST /ingest
- Accepts multipart file upload
- Supported types: .txt, .md, .html, .csv, .json — reject others with 400
- Rejects empty file content with 400
- Returns JSON: {"id": <int>, "filename": <str>, "chars": <int>, "chunks": <int>}
- chunk count must be accurate — compute by actually chunking the text
POST /chat
- Accepts JSON: {"question": <str>}
- Rejects blank question with 400
- Returns JSON: {"answer": <str>}
## Frontend (index.html)
Plain HTML/CSS/JS. No build step. No frameworks.
Full-viewport two-column flex layout. Left sidebar (fixed 280px). Right main fills remaining width.
Color palette:
- Primary: #0021A5 (blue)
- Background: #f4f4f5
- Surface: #ffffff
- Border: #e4e4e7
- Muted: #a1a1aa
- Dark: #18181b
Sidebar:
- "Documents" heading
- Drag-and-drop upload zone (also clickable); accepts .txt, .md, .html, .csv, .json
- List of ingested docs showing id, filename, chunk count, char count
Main area:
- Scrollable message history with placeholder prompt initially
- User messages right-aligned with primary color background
- Assistant messages left-aligned with white surface background
- "Thinking…" placeholder while response is in flight
- Sticky input bar at bottom with textarea and Send button
- Enter submits; Shift+Enter inserts newline
- Send button disabled while request is in flight
- Toast notifications for upload success/errors and chat errors; auto-dismiss after 3s
## Startup Script (start.sh)
Single idempotent bash script that:
- Creates a virtualenv if one does not exist
- Installs dependencies from requirements.txt
- Starts uvicorn on port 8000
## Tests
tests/conftest.py:
- Adds project root to sys.path
- autouse fixture that monkeypatches RAG_DB_PATH to fresh temp file before each test
tests/test_rag.py must cover:
- Text chunking splits correctly on paragraph boundaries
- add_doc() rejects empty/whitespace-only input with ValueError
- Full round-trip: ingest a document, search it, get a grounded answer
Mocking rules for round-trip test:
- Mock litellm.embedding and litellm.completion with mock.patch
- Do NOT mock db functions
- litellm.embedding mock must return object with .data attribute: mock.Mock(data=[{"embedding": [0.1] * 768}])
- litellm.completion mock must return object where .choices[0].message.content is attribute-accessible
## Acceptance Criteria
- ./start.sh boots the server on port 8000
- Uploading a .txt returns accurate id, filename, char count, and chunk count
- search("some query", 3) returns non-empty list of strings after ingestion
- answer("some question") returns grounded answer string
- UI renders with sidebar, chat bubbles, drag-drop upload
- .env controls API base, key, embedding model, and chat model
- All tests pass with proper DB isolation
- rag.py is under ~200 lines
Step 3 — Configure Your Environment
After OpenCode generates the project, create your .env file from the example:
cp .env.example .env
Edit .env with your NaviGator Toolkit credentials:
OPENAI_API_KEY=your-navigator-toolkit-api-key
OPENAI_API_BASE=https://api.ai.it.ufl.edu/v1
EMBEDDING_MODEL=openai/nomic-embed-text-v1.5
CHAT_MODEL=openai/gpt-oss-120b
RAG_DB_PATH=rag.db
LiteLLM requires an openai/ prefix on all model names when using a custom OPENAI_API_BASE, including embedding models like nomic-embed-text-v1.5.
Step 4 — Start the App
Make the startup script executable, then run it:
chmod +x start.sh
./start.sh
Or, if you prefer to run the steps manually:
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
uvicorn app:app --port 8000
Open http://localhost:8000 in your browser. You'll see a two-column interface: a document upload sidebar on the left and a chat window on the right.
Step 5 — Try It Out
- Upload a document — drag a
.txtor.mdfile into the sidebar - Ask a question — type a question in the chat input and press Enter
- Get a grounded answer — the app retrieves relevant chunks from your document and generates a contextual response
How It Works
User uploads file → chunks text → embeds each chunk → stores in SQLite (sqlite-vec)
User asks question → embeds question → finds nearest chunks → sends to chat model → returns answer
The app combines two NaviGator Toolkit capabilities:
- Embeddings (
nomic-embed-text-v1.5) to convert text into semantic vectors - Text Generation (
gpt-oss-120b) to generate answers grounded in retrieved context
Next Steps
Once you have the base app running, you can extend it by adjusting models in .env, uploading different document types, or asking OpenCode to add features like streaming responses, multi-document search, or a different frontend. The architecture is intentionally minimal to make it easy to adapt.