First install pytorch, then run the following command to install the rest of the dependencies under the same environment:
pip install -r requirements.txtUsing Qwen-7B as the model, use up to 80% of the GPU memory
python -m vllm.entrypoints.openai.api_server --model 'Qwen-7B-Chat-Int4' --trust-remote-code -q gptq -dtype float16 --gpu-memory-utilization 0.8python indexer.pypython rag.py