A 12-week, project-based roadmap for learning data analysis, Unix/Linux basics, and LLM apps using Python, R, PydanticAI, and Ollama (local models).
All resources are 100% free and include lectures, interactive projects, GitHub practice, and an optional "cool kids" track with Neovim + CLI tools.
- Install Python 3.11 →
brew install [email protected] - Install uv package manager →
brew install uv - Install Git →
brew install git - Install iTerm2 →
brew install --cask iterm2 - Install VSCode →
brew install --cask visual-studio-code - Verify Python version →
python3 --version
- Install R →
brew install --cask r - Install RStudio →
brew install --cask rstudio
- Install JupyterLab →
uv pip install jupyterlab - Run Jupyter →
jupyter lab
- Install NumPy →
uv pip install numpy - Install pandas →
uv pip install pandas - Install matplotlib →
uv pip install matplotlib - Install seaborn →
uv pip install seaborn - Install SciPy →
uv pip install scipy - Install statsmodels →
uv pip install statsmodels
- Install Pydantic →
uv pip install pydantic - Install PydanticAI →
uv pip install pydantic-ai - Install ChromaDB →
uv pip install chromadb
- Install Ollama →
brew install ollama - Pull Llama3.1 model →
ollama pull llama3.1 - Pull nomic-embed-text model →
ollama pull nomic-embed-text - Test Ollama →
ollama run llama3.1 "Hello world"
- Create project folder →
mkdir data-analyst-curriculum && cd data-analyst-curriculum - Create virtual environment →
uv venv .venv - Activate environment →
source .venv/bin/activate
data-analyst-curriculum
├── 01_unix_linux
├── 02_python_basics
├── 03_pandas_eda
├── 04_stats_intro
├── 05_r_tidyverse
├── 06_pydanticai_rag
├── 07_pydanticai_assistant
├── 08_sql_tooling
├── 09_eval_observability
└── capstones
├── analyst_copilot_pydanticai
└── r_eda_report
- CS50’s Introduction to Computer Science (Harvard)
- freeCodeCamp – Linux for Beginners (4 hrs)
- MIT Missing Semester – Shell Tools & Scripting
- OverTheWire – Bandit Wargame (hands-on Linux terminal practice)
- freeCodeCamp – Python for Beginners (4 hrs)
- Kaggle – Python Course
- CS50’s Introduction to Python (Harvard)
- freeCodeCamp – Scientific Computing with Python Certification
Projects: Arithmetic Formatter, Time Calculator, Budget App, Polygon Area Calculator, Probability Calculator
- Khan Academy – Statistics & Probability
- StatQuest with Josh Starmer
- MIT OCW – Probability & Statistics
- freeCodeCamp – R Programming for Data Science (2 hrs)
- R for Data Science (Book)
- freeCodeCamp – Data Analysis with Python Certification
Projects: Mean-Variance Calculator, Demographic Data Analyzer, Medical Data Visualizer, Time Series Visualizer, Sea Level Predictor
- freeCodeCamp – Data Analysis with Python (5 hrs)
- Kaggle – Pandas Course
- Kaggle – Data Visualization Course
- freeCodeCamp – LangChain for Beginners (concepts apply to PydanticAI)
- DeepLearning.AI – Free Short Courses on LLMs
- freeCodeCamp – Relational Database Certification
Projects: Celestial Bodies Database, World Cup Database, Salon Scheduler, Number Guessing Game
- freeCodeCamp – Machine Learning with Python Certification
Projects: Rock, Paper, Scissors AI, Cat/Dog Classifier, Book Recommender, Stock Predictor, Neural Network SMS Classifier
- GitHub – Hello World Guide
- freeCodeCamp – Git & GitHub for Beginners (2 hrs)
- Oh My Git! – Interactive Git Game
- Create a repo for each week/project
- Add
README.mdsummarizing what was learned - Commit often (
git add . && git commit -m "message" && git push) - Upload Jupyter Notebooks (render well on GitHub)
- Pin best repos to showcase skills
- Install Neovim →
brew install neovim - Learn basics:
i(insert),:wq(save & quit),dd(delete line),/(search) - freeCodeCamp – Vim Tutorial for Beginners (2 hrs)
- Explore config:
~/.config/nvim/init.lua
- Install fzf (fuzzy finder) →
brew install fzf - Install ripgrep (fast search) →
brew install ripgrep - Install htop (system monitor) →
brew install htop - Install bat (better
cat) →brew install bat - Install exa (better
ls) →brew install exa - Bonus: Try tmux for terminal multiplexing →
brew install tmux
- Replace basic commands:
cat → bat,ls → exa - Use
fzfto search command history - Use
ripgrepto search code fast - Manage processes with
htop - Keep multiple projects open in
tmux
✅ Finish Week -1 through Week 12, sprinkle in the optional “cool kids” track, and she’ll have a full data + AI + Linux toolkit with a strong GitHub portfolio.
This project roadmap matches the 12-week curriculum. All projects are aligned to tools introduced that week, and many include LLM integration using PydanticAI and Ollama.
Project: Command Line File Audit Tool
- Write a Python CLI tool to summarize number, type, and size of files in a directory.
- Optional: Output as a CSV report.
Project: Python Expense Tracker (CLI)
- Build a simple CLI that logs expenses and categorizes them.
- Output to CSV with
pandas. - Stretch: Add terminal charts with
richorplotext.
Project: CSV Column Analyzer
- Let users input a CSV and return column types, null counts, and stats.
- Use
pydantic-aito summarize or rephrase the report via LLM.
Project: Netflix Dataset EDA
- Use
pandas,matplotlib, andseabornto explore the Netflix dataset. - Ask Ollama to suggest plots or describe patterns with
pydantic-aischema outputs.
Project: Survey Summary Generator
- Take a small dataset of survey results.
- Use Python to calculate means, modes, distributions.
- Ask LLM to generate a Markdown report with PydanticAI.
Project: R Data Explorer
- Use R (
dplyr,ggplot2) to generate a basic report of any dataset. - Export tables/plots.
- Bonus: Knit an
.Rmdto HTML.
Project: Airbnb Dataset EDA Bot
-
Use Kaggle's NYC Airbnb dataset.
-
Build a CLI or notebook that allows user to:
- Load data
- Ask questions
- Get answers powered by Ollama + PydanticAI
Project: Dirty CSV Fixer
- Input: Messy CSV (with typos, missing units, bad formats).
- Define a schema with
pydantic-ai. - LLM suggests cleaned rows and explanations.
Project: CSV Chat Agent
- Upload a CSV and ask natural questions ("What’s the avg revenue in Q1?").
- Use
pydantic-ai+chromadb+ollamafor retrieval.
Project: PDF → Markdown Extractor
- Load a multi-page PDF report (e.g. economic data).
- Use PyMuPDF to extract text.
- Ask LLM to generate a summary and key metrics as Markdown.
Project: Natural Language to SQL
- Load a SQLite database (e.g. ecommerce).
- User types: "Top 5 customers by revenue."
- LLM generates SQL → fetch results → show as table or chart.
Project: Analyst Copilot (Part 1)
-
Combine multiple tools:
- Ask a question
- Pull from CSV + SQL
- Output: Graphs + Summary + raw data
-
Use
pydantic-aito structure answers.
Project: Analyst Copilot (Final)
- Wrap up with a CLI or Web UI (Streamlit).
- Multi-modal assistant: PDF + CSV + SQL.
- Structured answers + human-friendly output.
- Host on GitHub with full documentation.
These projects ensure hands-on practice across data analysis, LLM-powered workflows, and reproducible portfolio-building with Python, R, SQL, and PydanticAI.