This lightweight local RAG project allows you to experiment with Retrieval-Augmented Generation pipelines for a wide range of applications.
This project is a technical playground to test how a local LLM (like Gemma) can assist users by answering context-aware questions using local documents. It allows:
- Document injection via text/markdown files,
- Chunking and vectorization of knowledge,
- Prompt-based interaction with a model like
gemma3:12b-it-qat, - Multilingual vector embeddings with
nomic-embed-text, - Local and configurable RAG pipeline.
- Ollama (to run local LLMs)
- Node.js (v18+ recommended)
- macOS (tested on M2 with 16GB RAM)
On macOS you must install brew, then type this command:
brew install ollamaLaunch the Ollama server:
ollama serveOpen a new Terminal, or a new tab, then enter this command to install the model:
ollama pull gemma3:12b-it-qat && ollama pull nomic-embed-textgit clone https://github.com/craft-and-code/ai-agent-rpg.git
cd ai-agent-rpg
npm installPut your .txt or .md files inside the data/ directory. Suggested structure:
data/
├── rules/
│ ├── character-creation.md
│ ├── combat.md
│ └── gear.md
├── univers/
│ ├── timeline.txt
│ └── factions.mdThese files will be automatically chunked and vectorized.
node chunker.js- This reads all
.mdand.txtfiles from./data/, - Files already processed (based on MD5 hash) will be skipped.
Output: ./build/chunks.json
node embedder.jsThis will produce vector embeddings stored in: ./build/embeddings.json, using the nomic-embed-text model served by Ollama.
Edit this file: config/prompt.txt.
This file contains the system prompt that defines:
- The tone (cold, factual, machine-like),
- Instructions to avoid hallucination,
- Role-playing logic and interaction preferences.
node rag/query.jsThis script will:
- Prompt the user for a question,
- Find top-matching chunks from the document base,
- Construct a prompt with prompt + chunks + question,
- Send it to Ollama’s local model and return the answer.
This feature is not yet implemented but is planned for a future update. It will allow the AI agent to:
- Retain previous interactions with the user,
- Maintain coherence across multiple sessions,
- Adapt its responses based on conversation history.
You can tune model behavior in ollama.js, and define the default model name in config/model.js.
Embedding generation uses nomic-embed-text as the default embedding model, served by Ollama.
const response = await axios.post('http://localhost:11434/api/generate', {
model: 'gemma3:12b-it-qat',
prompt,
stream: false,
temperature: 0.2,
top_k: 40,
top_p: 0.9,
repeat_penalty: 1.2,
num_predict: 512
});
The constants in config/model.js ensures consistent use across scripts.
- Lower temperature = stricter answers,
- Add/remove
.mdor.txt→ re-runchunker.js+embedder.js - Prompt is your AI’s “soul” → shape it wisely.