This Python agent automatically generates a custom parser for any bank statement PDF. It uses a self-correcting loop to generate Python code, test it against sample CSV data, and refine the parser up to 3 times. The generated parser function, parse_pdf(pdf_path: str) -> pd.DataFrame, returns a pandas DataFrame that exactly matches the expected CSV schema. This allows evaluators to run the agent on new bank statements without manual code changes.
The agent operates in a self-correction loop. It begins by reading a sample PDF to understand its text structure and a target CSV file to understand the desired output format. It then constructs a detailed prompt, including this context, and instructs the Gemini model to generate a Python parser. Immediately after creation, the agent saves the parser and executes a test to validate that the parser's output structure (e.g., dictionary keys, return types) matches the target CSV schema. If the test fails, the agent captures the error output, adds it as feedback to the prompt, and asks the model to generate a corrected version. This cycle repeats up to three times until the test passes or the attempts are exhausted.
┌───────────┐
│ PLAN │
│ Generate │
│ LLM │
└─────┬─────┘
│
▼
┌───────────┐
│ GENERATE │
│ Parser │
└─────┬─────┘
│
▼
┌───────────┐
│ TEST │
│ Compare │
│ DataFrame │
└─────┬─────┘
│ Fail? ── Yes ──▶ Feedback → PLAN
│
▼
Done
git clone https://github.com/apurv-korefi/ai-agent-challenge.git
cd ai-agent-challengepip install -r requirements.txt
# If requirements.txt not provided:
pip install pandas pdfplumber google-generativeaiexport GEMINI_API_KEY="your_gemini_api_key_here"(Windows CMD: set GEMINI_API_KEY=your_key)
Generate a parser for a target bank (e.g., ICICI):
python agent.py --target iciciThis will:
- Read
data/icici/icici_sample.pdfanddata/icici/icici_sample.csv - Generate
custom_parsers/icici_parser.py - Run automated contract tests and self-correct up to 3 times if needed
- Log results to
logs/icici_parser_generation.log
To generate a parser for a new bank (e.g., SBI):
- Place a sample PDF and CSV in
data/sbi/sbi_sample.pdfanddata/sbi/sbi_sample.csv. - Run:
python agent.py --target sbiThe agent will automatically generate custom_parsers/sbi_parser.py and test it.
- The agent ensures the generated parser strictly adheres to the CSV schema.
- Logs in
logs/capture every attempt, including the generated code and test results. - The parser uses
pdfplumberand regex for robust extraction of dates, amounts, and descriptions.