llm-engineering

Journey to learn Large Language Models (LLMs)

Introduction:

Setup:

installed Anaconda
Setup the environment using environment.yml file

conda env create -f environment.yml

Added my openai api key to be able to connect to openai apis.
Activate the conda env

conda activate llms

Run jupyter lab to spin up the notebooks.

Notes:

Frontier models have some limitations, like:
1. Not a PhD level in some specialised domains, so an expert is still needed there.
2. Their knowledge is cut off to a certain date.
3. Can have blind spots and give wrong answers very confidently.
Transformer Architecture is the one behind LLMs.
Parameters in a model are also called weights, and these weights are actually the one controlling how the output will look like form the model.
Tokens are the individual units which get passed into a model. Tokens are a middle ground solution and breakthrough (word stem) between trying to train the neural network on a character basis and trying to train them based on word by work basis.
Context Window: Max number of texts in terms of tokens that the model can consider when generating the next token.
LLMs are stateless, so they don't keep track of any info and every API call is unique (e.g User: "My favourite sport is football" - GPT: "That's great, football is ...." | User: "What's my favourite sport?" - GPT: "Sorry I don't have this information").
ChatGPT on the other hand is a separate application, different from simply making a stateless api call, it saves the conversation history and sends it every time you send something to the API so it has context.
Exceeding the context window is usually handled by two methods (trucating older parts of the chat's history, or Chunking the data to keep the logic coherent, then run LLM on the aggregated (LLM)processed chunks)

Multi Modal Chatbot:

Notes:

Tools basically refer to the idea of allowing frontier models to communicate with external functions.
This could be querying something, or doing an action.
LLMs are bad at calculations, so one use case is to use a calculator as an external function that LLMs can use.
Agents (Agentic AI) is a software that can autonomously perform tasks. They are Autonomous, Task Specific, and Goal-oriented.
The term 'Agentic AI' and Agentization is an umbrella term that refers to a number of techniques, such as:
1. Breaking a complex problem into smaller steps, with multiple LLMs carrying out specialized tasks
2. The ability for LLMs to use Tools to give them additional capabilities
3. The 'Agent Environment' which allows Agents to collaborate
4. An LLM can act as the Planner, dividing bigger tasks into smaller ones for the specialists
5. The concept of an Agent having autonomy / agency, beyond just responding to a prompt - such as Memory

HuggingFace

Notes:

This is an open source platform that has models, datasets, transformers, ... ready to use for us.
There are six libraries behind HuggingFace: ( Hub, datasets, transformers, Peft, TRL, Accelerate )
Hugging Face has two APIs: 1. Pipelines (high level, easy to use api), 2. Tokenizer&Models (Advanced API)
Tasks that we can do with pipelines API are, in addition to generating text, imgs and audio:

Sentiment Analysis
Classifier
Named Entity Recognition
Question Answering
Summarising
Translation

When using Tokeniezers&Models, there're some techniques that we should pay attention to:

Quantization: Reducung the precision of the weights in the model, so it fits in memory and runs faster
Streaming: Stream the result back instead of just returing it in a big bulk.

Models for code Generation:

The Chinchilla Scaling Law: Number of parameters (Weights) in a model is proportional to the number of training tokens.
Common Benchmarks for deciding which model to use for a task at hand:

ARC -> Reasoning -> used for evaluating scientific reasoning, MCQ..
DROP -> Language Comparison -> Distill details from text then add, count,...
HellaSwag -> CommonSense -> Harder endings, long contexts, low shot activities
MMLU -> Understanding -> Factual recall, reasoning and problem solving across 57 subjects
TruthfulQA -> Accuracy -> Robust in providing truthful replies in adversarial conditions
Winogrande -> Context -> Test LLM understand context and resolve ambiguity
GSM8K -> Math -> Math and word problems taught in elementary and middle schools

Specific Benchmarks:

ELO -> Chat -> Head-to-head face offs with other LLMs, Like ELO in Chess
HumanEval -> Python Coding -> 164 problems wiiting code based on docstrings
Multipl-E -> Broader Coding -> Translation of HumanEval to 18 programming langyages

Some of the limitations to the Benchmarks tests:

Not consistently applied
too narrow in scope
hard to measure nuanced reasoning
training data leakage
Overfitting
Frontier LLMs may be aware thet they are being evaluated (not yet proven)

New harder Benchmarks to overcome the limitation

GPQA -> Graduate Tests -> 448 expert questions; non-phd humans score 34% even with web access
BBHard -> future capabilities -> 204 tasks believed beyond capabilities of llms
Math Lv 5 -> Math -> High-school math competition problems
IFEval -> difficult instructions -> like "write more than 400 words ..."
MuSR -> Multiple Soft Reasoning -> Logical deduction, such as analyzing 1000 words murder mystery and answering "who has means motive and opportunity"
MMLU-PRO -> Harder MMLU -> harder mmlu with 10 choices instead of 4 in the test

Hugging face Leaderboard is an awesome place for comparing models againsts different benchmarks.https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
1-introduction		1-introduction
2-MultiModal-Chatbot		2-MultiModal-Chatbot
3-huggingface-opensource		3-huggingface-opensource
.gitignore		.gitignore
README.md		README.md
business.jpg		business.jpg
environment.yml		environment.yml
important.jpg		important.jpg
resources.jpg		resources.jpg
voyage.jpg		voyage.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

llm-engineering

Introduction:

Setup:

Notes:

Multi Modal Chatbot:

Notes:

HuggingFace

Notes:

Models for code Generation:

About

Uh oh!

Releases

Packages

Languages

mu-majid/LLM-engineering-projects

Folders and files

Latest commit

History

Repository files navigation

llm-engineering

Introduction:

Setup:

Notes:

Multi Modal Chatbot:

Notes:

HuggingFace

Notes:

Models for code Generation:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages