github.com/tuffrabit/openai-sample-rag

Description

Just wrapping my brain around RAG and embeddings and the OpenAI API. This is a super arbitrary example with 100% made up animals. The idea is to ask the LLM about something its PROBABLY never encountered. Either new data (newer than the model's training cutoff) or private/proprietary data.

Design

A single CLI tool that has multiple possible flags. One flag to scan an adjacent "animals" directory full of markdown files, fill a local DB with the markdown content, and then finally ask the OpenAI API to generate embeddings per markdown document and fill the local DB with that as well. A second flag to issue a prompt. A third flag to skip the RAG, for response comparison. When a prompt is issued and the RAG is not skipped, the prompt also has an embedding generated, some magic trigonometry is done to determine which animal embedding is geometrically "closest" to the prompt embedding, and finally the initial prompt is re-engineered to include the relevant animal fact content.

Setup

Either build from source or download a pre-built binary from the releases
Use the .env.sample file to create a .env file that contains a valid OpenAI API key and whatever username you wanna use
Create an directory next the binary called "animals"
Fill the animals directory with as many animal markdown files you can think of using the sample-animal.md as an example (the more the better). Each animal markdown file should be named with the animal name. Try to keep the contents inside each file to a minimum, othewise you'll run into token and context limitation issues with the OpenAI API. HEAR ME ON THIS PART... You gotta really come up with some banger ridiculous animal names that YOU ARE SURE the AI has never heard of before.

Running

The first time you run this, you need to populate the DB. Use the "-populatedb" flag to do so. If the planets align and you have the .env file and animals directory setup correctly the app will fill the local DB with animal data and the linked embeddings from the OpenAI API. Once the DB is populated you won't have to do it again unless you change anything in the animals directory. Keep in mind, this step will use some amount of OpenAI API tokens and cost a small amount of real $$$. It's using a modest model so the expense should be minimal. RUN AT YOUR OWN RISK.
After the DB is populated you can simply run it with the "-prompt" flag. Use that to ask the LLM about the animals you've come up with.
You can also run the app with the "-skiprag" flag + the "-prompt" flag to see how the LLM responds to your animal questions without doing the embedding proximity calculation.

Result

You SHOULD see the LLM either straight up hallucinate (lie) or say it has no idea about your stupid made up animals while skipping RAG. While using RAG, you should see the LLM pull in the embedded animal markdown as prompt data and actually give you something resembling a real answer.

Video explanation

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.env.sample		.env.sample
.gitignore		.gitignore
README.md		README.md
animals.go		animals.go
data.go		data.go
go.mod		go.mod
go.sum		go.sum
main.go		main.go
openai.go		openai.go
openairagtest-example.png		openairagtest-example.png
sample-animal.md		sample-animal.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

github.com/tuffrabit/openai-sample-rag

Description

Design

Setup

Running

Result

About

Uh oh!

Releases 1

Packages

Languages

tuffrabit/openai-sample-rag

Folders and files

Latest commit

History

Repository files navigation

github.com/tuffrabit/openai-sample-rag

Description

Design

Setup

Running

Result

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages