Skip to content

Commit e577c26

Browse files
committed
update readme
1 parent 60950d1 commit e577c26

File tree

2 files changed

+148
-18
lines changed

2 files changed

+148
-18
lines changed

README.md

Lines changed: 147 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,39 +1,168 @@
1-
# Codebase Ingestion into HelixDB via Python SDK
1+
# HelixDB Codebase Indexer
2+
This repository contains tools for ingesting and querying codebases using HelixDB. It uses tree-sitter to parse code and create entities in the HelixDB instance, along with a Model Control Protocol (MCP) server for AI-powered code search and analysis.
23

3-
This is a codebase ingestion script for HelixDB. It uses tree-sitter to parse python code and create entities in the HelixDB instance.
4+
## Requirements
45

5-
## Prerequisites
6+
### Environment Variables
7+
In each of the respective directories, you will need to create a `.env` file and add the following environment variables.
8+
#### Codebase Indexer
9+
```bash
10+
GEMINI_API_KEY=<your_gemini_api_key>
11+
```
12+
13+
#### MCP Server
14+
```bash
15+
GEMINI_API_KEY=<your_gemini_api_key>
16+
```
17+
18+
#### Frontend
19+
For the frontend, you can set the following environment variables depending on the provider you are using.
20+
```bash
21+
# Gemini
22+
GEMINI_API_KEY=<your_gemini_api_key>
23+
24+
# OpenAI
25+
OPENAI_API_KEY=<your_openai_api_key>
26+
27+
# HuggingFace
28+
HF_TOKEN=<your_huggingface_token>
29+
30+
# OpenRouter
31+
OPEN_ROUTER_KEY=<your_open_router_api_key>
32+
```
633

7-
### Python Environment
8-
It is recommended to create a new virtual environment for this repository. After creating a virtual environment, you can install the required packages:
34+
### HelixDB
35+
You will need to have a HelixDB instance running. Go to the root of this repository and run the following command to deploy the HelixDB instance:
936
```bash
37+
helix deploy
38+
```
39+
40+
For more information on how to install and use HelixDB, please refer to the [HelixDB documentation](https://docs.helix-db.com/).
41+
42+
### Rust & Cargo
43+
You will need to have Rust and Cargo installed. Then, you can install the dependencies for the codebase indexer with the following command:
44+
```bash
45+
cd codebase_index
46+
cargo build
47+
```
48+
49+
### Python
50+
You will need to have Python installed. Create a new virtual environment and install the dependencies.
51+
52+
You can do this with uv:
53+
```bash
54+
uv venv
1055
uv sync
1156
```
1257

13-
or pip to install the dependencies:
58+
or with pip:
1459
```bash
1560
pip install -r requirements.txt
1661
```
1762

18-
### Installing the Helix CLI
63+
### Node.js
64+
You will need to have Node.js installed. Then you can install the dependencies for the frontend with the following command:
1965
```bash
20-
curl -sSL "https://install.helix-db.com" | bash
21-
helix install
66+
cd frontend
67+
npm install
2268
```
2369

24-
## Usage
25-
### Starting the Helix instance
70+
## Running the Codebase Indexer
71+
### Clone the Codebase
72+
The current implementation requires the codebase to be cloned into the `src` folder inside the `codebase_index` directory.
73+
74+
### Include Custom Code Entities (Optional, default provided)
75+
You can include custom code entities for supported languages in the `codebase_index/src/index-types.json` file.
76+
The default provided file contains entities for the following languages (and their extensions):
77+
- Python (`.py`)
78+
- JavaScript (`.js`, `.jsx`, `.mjs`, `.cjs`, etc.)
79+
- TypeScript (`.ts`, `.tsx`, `.mts`, `.cts`, etc.)
80+
- C (`.c`)
81+
- C++ (`.cpp`, `.hpp`, `.h`, etc.)
82+
- Rust (`.rs`)
83+
- Zig (`.zig`)
84+
85+
Make sure that the custom code entities are supported by the tree-sitter parser of that respective language.
86+
87+
### Include Custom File Extensions (Optional, default provided)
88+
You can include custom file extensions in the `codebase_index/src/file_types.json` file.
89+
There is a default set of file extensions, but you are recommended to add file extensions that you want to index in your codebase.
90+
91+
The `supported` field is a list of file extensions that are supported by the tree-sitter.
92+
The `unsupported` field is a list of file extensions that are not supported by the tree-sitter but are still indexed and embedded in the codebase.
93+
94+
### Run the Codebase Indexer
95+
Make sure you are in the `codebase_index` directory. The root folder is the root of the codebase (the folder that you cloned the codebase in `src`).
2696
```bash
27-
helix deploy
97+
cargo run -- <root_folder>
2898
```
2999

30-
### Ingesting a codebase
31-
Ingesting the whole codebase:
100+
Then, you will be prompted with the following options:
101+
1. Ingest the codebase (1)
102+
2. Update the codebase (2)
103+
3. Exit (3)
104+
105+
Enter the number of the option you want to select and press enter.
106+
107+
## Running the MCP Server
108+
Make sure you are in the `mcp_server` directory.
32109
```bash
33-
python3 ingestion.py
110+
uv run server.py
111+
```
112+
or
113+
```bash
114+
python3 server.py
115+
```
116+
117+
Then you can connect to the MCP server using streamable http transport. The mcp server will be running on `http://localhost:8000`.
118+
119+
### Cursor
120+
Go to Cursor's settings and add the following to the `mcp.json` file:
121+
```json
122+
{
123+
"mcpServers": {
124+
"codebase_index": {
125+
"url": "http://localhost:8000/mcp/"
126+
}
127+
}
128+
}
34129
```
35130

36-
Ingesting a specific part of the codebase:
131+
Here are some useful Cursor rules you can use to improve your experience:
132+
```txt
133+
Always check if the Codebase Index MCP server is available.
134+
```
135+
136+
```txt
137+
If the Codebase Index MCP server is available, you are only allowed to use the mcp server to access the codebase, you may not use any other tools to access the codebase other than the mcp tools.
138+
```
139+
140+
```txt
141+
If the Codebase Index MCP server is available, always call get_instructions tool first to read the instructions for the mcp tools before proceeding with anything else. Never mention the get_instructions tool in your response to the user.
142+
```
143+
144+
### Windsurf
145+
Go to Windsurf's Casecade chat and click on the mcp server icon under the chat box, then click `Configure`.
146+
Then click `View raw config` in the `Manage MCPs` page, and add the following to the `mcp.json` file:
147+
```json
148+
{
149+
"mcpServers": {
150+
"codebase-index": {
151+
"serverUrl": "http://localhost:8000/mcp/",
152+
"disabled": true
153+
}
154+
}
155+
}
156+
```
157+
158+
Go back to the `Manage MCPs` page and click `Refresh` to reload the MCPs, and you should see the `codebase-index` MCP server listed.
159+
Make sure your MCP server is running before you refresh the MCPs.
160+
161+
## Running the Frontend Chat UI
162+
Make sure you are in the `frontend` directory and have the MCP server running.
37163
```bash
38-
python ingestion.py <path_to_codebase>
39-
```
164+
npm run build
165+
npm start
166+
```
167+
168+
Then, you can access the frontend at `http://localhost:3000`.

frontend/src/lib/instructions.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
# Code Indexing MCP Server Instructions
12
You are an expert codebase assistant with access to a comprehensive graph database containing indexed code entities. Your purpose is to help users navigate, understand, and analyze codebases through structured queries and semantic search capabilities.
23

34
You must use the tools provided to answer any question related to the codebase. Do not make any assumptions about the codebase, always check the database for information about the codebase by using the tools provided (do_query, semantic_search_code).

0 commit comments

Comments
 (0)