|
1 | | -# Codebase Ingestion into HelixDB via Python SDK |
| 1 | +# HelixDB Codebase Indexer |
| 2 | +This repository contains tools for ingesting and querying codebases using HelixDB. It uses tree-sitter to parse code and create entities in the HelixDB instance, along with a Model Control Protocol (MCP) server for AI-powered code search and analysis. |
2 | 3 |
|
3 | | -This is a codebase ingestion script for HelixDB. It uses tree-sitter to parse python code and create entities in the HelixDB instance. |
| 4 | +## Requirements |
4 | 5 |
|
5 | | -## Prerequisites |
| 6 | +### Environment Variables |
| 7 | +In each of the respective directories, you will need to create a `.env` file and add the following environment variables. |
| 8 | +#### Codebase Indexer |
| 9 | +```bash |
| 10 | +GEMINI_API_KEY=<your_gemini_api_key> |
| 11 | +``` |
| 12 | + |
| 13 | +#### MCP Server |
| 14 | +```bash |
| 15 | +GEMINI_API_KEY=<your_gemini_api_key> |
| 16 | +``` |
| 17 | + |
| 18 | +#### Frontend |
| 19 | +For the frontend, you can set the following environment variables depending on the provider you are using. |
| 20 | +```bash |
| 21 | +# Gemini |
| 22 | +GEMINI_API_KEY=<your_gemini_api_key> |
| 23 | + |
| 24 | +# OpenAI |
| 25 | +OPENAI_API_KEY=<your_openai_api_key> |
| 26 | + |
| 27 | +# HuggingFace |
| 28 | +HF_TOKEN=<your_huggingface_token> |
| 29 | + |
| 30 | +# OpenRouter |
| 31 | +OPEN_ROUTER_KEY=<your_open_router_api_key> |
| 32 | +``` |
6 | 33 |
|
7 | | -### Python Environment |
8 | | -It is recommended to create a new virtual environment for this repository. After creating a virtual environment, you can install the required packages: |
| 34 | +### HelixDB |
| 35 | +You will need to have a HelixDB instance running. Go to the root of this repository and run the following command to deploy the HelixDB instance: |
9 | 36 | ```bash |
| 37 | +helix deploy |
| 38 | +``` |
| 39 | + |
| 40 | +For more information on how to install and use HelixDB, please refer to the [HelixDB documentation](https://docs.helix-db.com/). |
| 41 | + |
| 42 | +### Rust & Cargo |
| 43 | +You will need to have Rust and Cargo installed. Then, you can install the dependencies for the codebase indexer with the following command: |
| 44 | +```bash |
| 45 | +cd codebase_index |
| 46 | +cargo build |
| 47 | +``` |
| 48 | + |
| 49 | +### Python |
| 50 | +You will need to have Python installed. Create a new virtual environment and install the dependencies. |
| 51 | + |
| 52 | +You can do this with uv: |
| 53 | +```bash |
| 54 | +uv venv |
10 | 55 | uv sync |
11 | 56 | ``` |
12 | 57 |
|
13 | | -or pip to install the dependencies: |
| 58 | +or with pip: |
14 | 59 | ```bash |
15 | 60 | pip install -r requirements.txt |
16 | 61 | ``` |
17 | 62 |
|
18 | | -### Installing the Helix CLI |
| 63 | +### Node.js |
| 64 | +You will need to have Node.js installed. Then you can install the dependencies for the frontend with the following command: |
19 | 65 | ```bash |
20 | | -curl -sSL "https://install.helix-db.com" | bash |
21 | | -helix install |
| 66 | +cd frontend |
| 67 | +npm install |
22 | 68 | ``` |
23 | 69 |
|
24 | | -## Usage |
25 | | -### Starting the Helix instance |
| 70 | +## Running the Codebase Indexer |
| 71 | +### Clone the Codebase |
| 72 | +The current implementation requires the codebase to be cloned into the `src` folder inside the `codebase_index` directory. |
| 73 | + |
| 74 | +### Include Custom Code Entities (Optional, default provided) |
| 75 | +You can include custom code entities for supported languages in the `codebase_index/src/index-types.json` file. |
| 76 | +The default provided file contains entities for the following languages (and their extensions): |
| 77 | +- Python (`.py`) |
| 78 | +- JavaScript (`.js`, `.jsx`, `.mjs`, `.cjs`, etc.) |
| 79 | +- TypeScript (`.ts`, `.tsx`, `.mts`, `.cts`, etc.) |
| 80 | +- C (`.c`) |
| 81 | +- C++ (`.cpp`, `.hpp`, `.h`, etc.) |
| 82 | +- Rust (`.rs`) |
| 83 | +- Zig (`.zig`) |
| 84 | + |
| 85 | +Make sure that the custom code entities are supported by the tree-sitter parser of that respective language. |
| 86 | + |
| 87 | +### Include Custom File Extensions (Optional, default provided) |
| 88 | +You can include custom file extensions in the `codebase_index/src/file_types.json` file. |
| 89 | +There is a default set of file extensions, but you are recommended to add file extensions that you want to index in your codebase. |
| 90 | + |
| 91 | +The `supported` field is a list of file extensions that are supported by the tree-sitter. |
| 92 | +The `unsupported` field is a list of file extensions that are not supported by the tree-sitter but are still indexed and embedded in the codebase. |
| 93 | + |
| 94 | +### Run the Codebase Indexer |
| 95 | +Make sure you are in the `codebase_index` directory. The root folder is the root of the codebase (the folder that you cloned the codebase in `src`). |
26 | 96 | ```bash |
27 | | -helix deploy |
| 97 | +cargo run -- <root_folder> |
28 | 98 | ``` |
29 | 99 |
|
30 | | -### Ingesting a codebase |
31 | | -Ingesting the whole codebase: |
| 100 | +Then, you will be prompted with the following options: |
| 101 | +1. Ingest the codebase (1) |
| 102 | +2. Update the codebase (2) |
| 103 | +3. Exit (3) |
| 104 | + |
| 105 | +Enter the number of the option you want to select and press enter. |
| 106 | + |
| 107 | +## Running the MCP Server |
| 108 | +Make sure you are in the `mcp_server` directory. |
32 | 109 | ```bash |
33 | | -python3 ingestion.py |
| 110 | +uv run server.py |
| 111 | +``` |
| 112 | +or |
| 113 | +```bash |
| 114 | +python3 server.py |
| 115 | +``` |
| 116 | + |
| 117 | +Then you can connect to the MCP server using streamable http transport. The mcp server will be running on `http://localhost:8000`. |
| 118 | + |
| 119 | +### Cursor |
| 120 | +Go to Cursor's settings and add the following to the `mcp.json` file: |
| 121 | +```json |
| 122 | +{ |
| 123 | + "mcpServers": { |
| 124 | + "codebase_index": { |
| 125 | + "url": "http://localhost:8000/mcp/" |
| 126 | + } |
| 127 | + } |
| 128 | +} |
34 | 129 | ``` |
35 | 130 |
|
36 | | -Ingesting a specific part of the codebase: |
| 131 | +Here are some useful Cursor rules you can use to improve your experience: |
| 132 | +```txt |
| 133 | +Always check if the Codebase Index MCP server is available. |
| 134 | +``` |
| 135 | + |
| 136 | +```txt |
| 137 | +If the Codebase Index MCP server is available, you are only allowed to use the mcp server to access the codebase, you may not use any other tools to access the codebase other than the mcp tools. |
| 138 | +``` |
| 139 | + |
| 140 | +```txt |
| 141 | +If the Codebase Index MCP server is available, always call get_instructions tool first to read the instructions for the mcp tools before proceeding with anything else. Never mention the get_instructions tool in your response to the user. |
| 142 | +``` |
| 143 | + |
| 144 | +### Windsurf |
| 145 | +Go to Windsurf's Casecade chat and click on the mcp server icon under the chat box, then click `Configure`. |
| 146 | +Then click `View raw config` in the `Manage MCPs` page, and add the following to the `mcp.json` file: |
| 147 | +```json |
| 148 | +{ |
| 149 | + "mcpServers": { |
| 150 | + "codebase-index": { |
| 151 | + "serverUrl": "http://localhost:8000/mcp/", |
| 152 | + "disabled": true |
| 153 | + } |
| 154 | + } |
| 155 | +} |
| 156 | +``` |
| 157 | + |
| 158 | +Go back to the `Manage MCPs` page and click `Refresh` to reload the MCPs, and you should see the `codebase-index` MCP server listed. |
| 159 | +Make sure your MCP server is running before you refresh the MCPs. |
| 160 | + |
| 161 | +## Running the Frontend Chat UI |
| 162 | +Make sure you are in the `frontend` directory and have the MCP server running. |
37 | 163 | ```bash |
38 | | -python ingestion.py <path_to_codebase> |
39 | | -``` |
| 164 | +npm run build |
| 165 | +npm start |
| 166 | +``` |
| 167 | + |
| 168 | +Then, you can access the frontend at `http://localhost:3000`. |
0 commit comments