A flexible LangGraph agent built with LangChain components, featuring:
- Multi-provider LLM support (Databricks, OpenAI, Azure OpenAI, Anthropic, Ollama)
- Human-in-the-loop middleware for tool approval and answer verification
- MLflow integration for deployment (optional)
- Modular architecture for easy maintenance and extension
- Environment-agnostic - runs in Databricks or standalone mode
This project uses Poetry for dependency management.
- Python 3.10 or 3.11
- Poetry (install via:
curl -sSL https://install.python-poetry.org | python3 -)
# Install dependencies
poetry install
# Activate the virtual environment
poetry shell
# Or run commands directly
poetry run python agent.pyCore:
langchain- LangChain framework (v1.0.0+)langchain-core- LangChain core (v1.0.0+)langgraph- LangGraph for agent workflowspython-dotenv- Environment variable management
Optional (provider-specific):
databricks-langchain- For Databricks providerlangchain-openai- For OpenAI/Azure OpenAI providerslangchain-anthropic- For Anthropic providerlangchain-community- For Ollama providermlflow-skinny[databricks]- For MLflow integration (optional)
The agent supports configuration via environment variables or a .env file:
# .env file
LLM_PROVIDER=azure_openai
LLM_ENDPOINT_NAME=gpt-4-deployment
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_API_KEY=your-api-key
AGENT_EXECUTION_MODE=standalonefrom agent import agent
config = {"configurable": {"thread_id": "1"}}
result = agent.invoke({
"messages": [{"role": "user", "content": "List all schemas"}]
}, config)The agent includes built-in human-in-the-loop middleware:
- SQL queries require approval before execution (configurable)
- Final answers can be verified before sending to users
See USAGE.md for detailed usage instructions and AZURE_OPENAI_SETUP.md for Azure OpenAI configuration.
Architecture Decision Records (ADRs) are documented in docs/adr/:
The search_tables_by_description tool uses a configurable term expansion system to map natural language descriptions to database column search terms.
Term expansions are stored in term_expansions.json. To add or modify expansions:
-
Edit the JSON file directly:
{ "your_term": ["term1", "term2", "term3", "synonym1", "synonym2"] } -
The file is automatically loaded and cached - no code changes needed.
-
For large-scale management, you can optionally load from a Databricks table:
- Uncomment the table loading code in
_load_term_expansions() - Create a table with columns:
term(string),expansions(JSON string) - Update the SQL query to point to your table
- Uncomment the table loading code in
To add support for "financial" terms:
{
"financial": ["financial", "finance", "fiscal", "money", "currency", "dollar", "euro", "payment", "transaction"],
"payment": ["payment", "pay", "transaction", "financial", "invoice", "billing"]
}The system will automatically:
- Cache expansions for performance
- Expand search terms when users query
- Search across all schemas in the catalog