A production-ready tool that monitors your Gmail inbox, detects real job application updates, and keeps a structured CSV/JSON trail of each application.
- Intelligent filtering with DSPy + OpenAI (classification + structured extraction)
- Token-optimized email preprocessing (HTML stripping, ATS awareness)
- Context-aware status tracking (Applied → Interview → Offer/Rejected)
- Persistent caching (
email_cache.json,email_fetch_cache.json) for resumable runs - CSV output deduplicated by company with merged role history
- Python 3.9+
- A Google Cloud project with Gmail API enabled
- OpenAI API key
- Desktop OAuth credentials (
credentials.json)
Install dependencies:
pip install -r requirements.txtCreate/activate a virtualenv if desired (python -m venv .venv && source .venv/bin/activate).
- Open Google Cloud Console
- Enable Gmail API for your project
- Configure OAuth consent screen (External → add yourself as a test user)
- Create OAuth client ID (Application type: Desktop app)
- Download the client JSON as
credentials.json(place in project root)
When you run the script, a browser window prompts you to authorize Gmail access. A token.json refresh token is generated for subsequent runs. If the token becomes invalid, delete token.json and re-run.
TrackApp loads configuration in this order (first found wins), then overlays environment variables:
config.local.json(git-ignored; personal settings)config.json(optional project defaults)config.example.json(fallback)
Environment variables override file values when present:
OPENAI_API_KEYTRACKAPP_START_DATETRACKAPP_OUTPUT_CSVTRACKAPP_CACHE_FILE
Optional: add a .env file (loaded when python-dotenv is installed):
OPENAI_API_KEY=sk-...
TRACKAPP_START_DATE=2025-07-01
cp config.example.json config.local.json
# edit config.local.json to your preferences
# or put OPENAI_API_KEY in .env
python main.pypython main.pyWorkflow:
- Loads configuration and caches (using the layered approach)
- Authenticates with Gmail (browser prompt on first run)
- Fetches emails from
start_dateonward, saving batches intoemail_fetch_cache.json - Processes emails from oldest to newest:
- Keyword filter → DSPy classifier → DSPy extractor (with cached results reused)
- Merges updates into
job_applications.csv - Saves intermediate results every 10 emails
- Deletes
email_fetch_cache.jsononly after a successful run (preserves it on errors/interrupts)
You can stop the script (Ctrl+C) at any time. It will persist caches so the next run resumes without re-fetching or re-classifying old emails.
job_applications.csv: one row per company with consolidated role list and historyemail_cache.json: per-email DSPy classification/extraction cache (prevents repeated LLM calls)email_fetch_cache.json: raw Gmail fetch cache (kept until a run completes successfully)
To reset everything:
rm email_cache.json email_fetch_cache.json job_applications.csv token.jsoninvalid_grantduring auth: deletetoken.json, ensurecredentials.jsonmatches the Google project, retry- Missing companies or duplicates: update
preprocessing.ats_domainsorexporter.invalid_company_names, rerun - High OpenAI cost: lower
max_email_words, tighten keywords, reducecontext_months_limit - Reprocessing from scratch: delete caches above and rerun
MIT Licensed. Contributions welcome.