Welcome to Good Reads on AI a curated collection of essential articles, papers, blogs, and documentation shared by the instructor of Vizuara's Build LLM & Gen AI course. This repository is designed to serve as a knowledge hub for anyone diving into the world of Large Language Models (LLMs), Generative AI, LLM production Deployment and modern machine learning.
π§ Large Language Models (LLMs)
Stanford Alpaca: A project that explores the fine-tuning of language models using instruction-following data.
- Repository: GitHub
- Blog Post: Stanford CRFM
LLMs from Scratch: A step-by-step guide to implementing a ChatGPT-like LLM in PyTorch.
- Repository: GitHub
Instruct Fine-Tuning Data: A JSON file containing data for instruction fine-tuning.
- Data File: GitHub
Hands-On Large Language Models: Official code repo for the O'Reilly Book - "Hands-On Large Language Models"
- Repository: Github
π Research Papers
- Denoising Diffusion Implicit Models - ArXiv
- Transfusion Model (Meta) - PDF
- Diffusion Models for Image Generation - ArXiv
- Bahdanau's Neural Machine Translation - ArXiv
- Attention Is All You Need - ArXiv
- TinyStories - ArXiv
- A Survey of LLMs - ArXiv
- Tokenizer Performance Across Indian Languages - ArXiv
- Deep Residual Learning for Image Recognition - ArXiv
- Language Modeling in a Sentence Representation Space ArXiv
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Arxiv
- Tree of Thoughts: Deliberate Problem Solving with Large Language Models ArXiv
- The Illusion of Thinking Apple
- RankPrompt: Step-by-Step Comparisons Make Language Models Better Reasoners ArXiv
- In context learning Arxiv
π Datasets
- FairytaleQA - GitHub
- LifeArchitect AI Datasets Table - LifeArchitect
- SentencePiece Tokenizer - GitHub
- CIFAR-10 - Info
π§ͺ Tutorials & Demonstrations
- Vizuara's substack - Blog
- Sketch-RNN Demo - Magenta
- Animated Transformer - Tutorial
- Word2Vec in TensorFlow - Tutorial
- AutoGen (Agentic AI) - GitHub
- Tokenizer Explorer - OpenAI Tokenizer, TikTokenizer Demo
- TikToken (OpenAI) - GitHub
- Byte Pair Encoding Explained - Blog
- This Person Does Not Exist (GANs) - Demo
- CNN Explainer - Demo
- Word2Vec (Visual Blog) - Jalammar Blog
- Sinusoidal Positional Encodings - Blog
- Machines of loving grace - Blog
- Vizuara RAG Chunking Strategies - Pdf
- Hands on LLM GOATED article π - Blog
- Prompt Eng VS Context Eng - Blog
π Notable Projects
- Mini-R1: Efficient small-scale models - Phil Schmid
- LifeArchitect AI Models Table - LifeArchitect
- Docling: Advanced PDF and document parsing - GitHub Paper