Skip to content

Burnsedia/data-science

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

Data Analysis + LLM Curriculum (Python, R, PydanticAI with Ollama)

A 12-week, project-based roadmap for learning data analysis, Unix/Linux basics, and LLM apps using Python, R, PydanticAI, and Ollama (local models).
All resources are 100% free and include lectures, interactive projects, GitHub practice, and an optional "cool kids" track with Neovim + CLI tools.


Week -1: Mac + Homebrew Setup Checklist

Step 1: Install Python + Tools

  • Install Python 3.11 → brew install [email protected]
  • Install uv package manager → brew install uv
  • Install Git → brew install git
  • Install iTerm2 → brew install --cask iterm2
  • Install VSCode → brew install --cask visual-studio-code
  • Verify Python version → python3 --version

Step 2: Install R + RStudio

  • Install R → brew install --cask r
  • Install RStudio → brew install --cask rstudio

Step 3: Install JupyterLab

  • Install JupyterLab → uv pip install jupyterlab
  • Run Jupyter → jupyter lab

Step 4: Install Data Science Libraries

  • Install NumPy → uv pip install numpy
  • Install pandas → uv pip install pandas
  • Install matplotlib → uv pip install matplotlib
  • Install seaborn → uv pip install seaborn
  • Install SciPy → uv pip install scipy
  • Install statsmodels → uv pip install statsmodels

Step 5: Install AI Stack

  • Install Pydantic → uv pip install pydantic
  • Install PydanticAI → uv pip install pydantic-ai
  • Install ChromaDB → uv pip install chromadb

Step 6: Install Ollama

  • Install Ollama → brew install ollama
  • Pull Llama3.1 model → ollama pull llama3.1
  • Pull nomic-embed-text model → ollama pull nomic-embed-text
  • Test Ollama → ollama run llama3.1 "Hello world"

Step 7: Create Project Folder

  • Create project folder → mkdir data-analyst-curriculum && cd data-analyst-curriculum
  • Create virtual environment → uv venv .venv
  • Activate environment → source .venv/bin/activate

Suggested Repo Layout

data-analyst-curriculum
├── 01_unix_linux
├── 02_python_basics
├── 03_pandas_eda
├── 04_stats_intro
├── 05_r_tidyverse
├── 06_pydanticai_rag
├── 07_pydanticai_assistant
├── 08_sql_tooling
├── 09_eval_observability
└── capstones
    ├── analyst_copilot_pydanticai
    └── r_eda_report

Week 0: CS & Unix/Linux

Lectures

Interactive


Weeks 1–3: Python Foundations + Unix/Linux Basics

Lectures

Interactive Projects


Weeks 4–5: Statistics & R

Lectures

Interactive Projects


Weeks 6–7: Python for Data Analysis

Lectures

Interactive


Weeks 6–12: LLMs, PydanticAI & AI Agents

Lectures


Week 10: SQL & Databases

Lectures

Interactive Projects


Weeks 11–12: Capstones & Projects

Resources

Interactive Projects


Git & GitHub (Do This Throughout)

Weekly GitHub Habits

  • Create a repo for each week/project
  • Add README.md summarizing what was learned
  • Commit often (git add . && git commit -m "message" && git push)
  • Upload Jupyter Notebooks (render well on GitHub)
  • Pin best repos to showcase skills

⚡ Optional: "Cool Kids" Neovim + CLI Tools Track

Neovim Setup

CLI Tools (Make Linux fun & powerful)

  • Install fzf (fuzzy finder) → brew install fzf
  • Install ripgrep (fast search) → brew install ripgrep
  • Install htop (system monitor) → brew install htop
  • Install bat (better cat) → brew install bat
  • Install exa (better ls) → brew install exa
  • Bonus: Try tmux for terminal multiplexing → brew install tmux

Practice

  • Replace basic commands: cat → bat, ls → exa
  • Use fzf to search command history
  • Use ripgrep to search code fast
  • Manage processes with htop
  • Keep multiple projects open in tmux

✅ Finish Week -1 through Week 12, sprinkle in the optional “cool kids” track, and she’ll have a full data + AI + Linux toolkit with a strong GitHub portfolio.

📊 Data Analysis + LLM Projects (Week-by-Week)

This project roadmap matches the 12-week curriculum. All projects are aligned to tools introduced that week, and many include LLM integration using PydanticAI and Ollama.


✅ Week 0: CS & Unix/Linux

Project: Command Line File Audit Tool

  • Write a Python CLI tool to summarize number, type, and size of files in a directory.
  • Optional: Output as a CSV report.

✅ Week 1: Python Fundamentals

Project: Python Expense Tracker (CLI)

  • Build a simple CLI that logs expenses and categorizes them.
  • Output to CSV with pandas.
  • Stretch: Add terminal charts with rich or plotext.

✅ Week 2: Python Functions + CLI Skills

Project: CSV Column Analyzer

  • Let users input a CSV and return column types, null counts, and stats.
  • Use pydantic-ai to summarize or rephrase the report via LLM.

✅ Week 3: Basic EDA in Python

Project: Netflix Dataset EDA

  • Use pandas, matplotlib, and seaborn to explore the Netflix dataset.
  • Ask Ollama to suggest plots or describe patterns with pydantic-ai schema outputs.

✅ Week 4: Statistics with Python

Project: Survey Summary Generator

  • Take a small dataset of survey results.
  • Use Python to calculate means, modes, distributions.
  • Ask LLM to generate a Markdown report with PydanticAI.

✅ Week 5: Intro to R & Tidyverse

Project: R Data Explorer

  • Use R (dplyr, ggplot2) to generate a basic report of any dataset.
  • Export tables/plots.
  • Bonus: Knit an .Rmd to HTML.

✅ Week 6: Advanced Pandas

Project: Airbnb Dataset EDA Bot

  • Use Kaggle's NYC Airbnb dataset.

  • Build a CLI or notebook that allows user to:

    • Load data
    • Ask questions
    • Get answers powered by Ollama + PydanticAI

✅ Week 7: Data Cleaning with LLMs

Project: Dirty CSV Fixer

  • Input: Messy CSV (with typos, missing units, bad formats).
  • Define a schema with pydantic-ai.
  • LLM suggests cleaned rows and explanations.

✅ Week 8: CSV Q&A Assistant

Project: CSV Chat Agent

  • Upload a CSV and ask natural questions ("What’s the avg revenue in Q1?").
  • Use pydantic-ai + chromadb + ollama for retrieval.

✅ Week 9: PDF Report Analyzer

Project: PDF → Markdown Extractor

  • Load a multi-page PDF report (e.g. economic data).
  • Use PyMuPDF to extract text.
  • Ask LLM to generate a summary and key metrics as Markdown.

✅ Week 10: SQL + LLM

Project: Natural Language to SQL

  • Load a SQLite database (e.g. ecommerce).
  • User types: "Top 5 customers by revenue."
  • LLM generates SQL → fetch results → show as table or chart.

✅ Week 11: Capstone Prep

Project: Analyst Copilot (Part 1)

  • Combine multiple tools:

    • Ask a question
    • Pull from CSV + SQL
    • Output: Graphs + Summary + raw data
  • Use pydantic-ai to structure answers.


✅ Week 12: Capstone Final

Project: Analyst Copilot (Final)

  • Wrap up with a CLI or Web UI (Streamlit).
  • Multi-modal assistant: PDF + CSV + SQL.
  • Structured answers + human-friendly output.
  • Host on GitHub with full documentation.

These projects ensure hands-on practice across data analysis, LLM-powered workflows, and reproducible portfolio-building with Python, R, SQL, and PydanticAI.

About

A course for my tutoring

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published