Skip to content

Conversation

@aaronchantrill
Copy link
Contributor

Description

This change basically gives Naomi the ability to use an LLM in crafting responses. The LLM is contacted over a web api, so any OpenAI compatible LLM server should be fine. I am currently using llama.cpp on a system with an nVidia GeForce RTX 3060 with 12 GB VRAM. I am currently using the following model:
https://huggingface.co/mav23/Llama_3.2_1B_Intruct_Tool_Calling_V2-GGUF/blob/main/llama_3.2_1b_intruct_tool_calling_v2.Q4_K_M.gguf This is only a 3B parameter model, so it should run fine on cards with less VRAM. I have also tried running it on an Intel graphics card on my laptop using SyCL which seems to work fine.

This is very much experimental. The basic idea is that I use the existing TTI parser to activate a plugin, then pass the output from that plugin to the LLM as a system message. This allows the LLM to use current data and answer questions like "What time is it?" or "Are we expecting rain on Thursday?" I'm also looking into using function calling which would allow the LLM to be used as an intent parser and might make plugin activation more integrated, but that has its own set of problems, including that different models define and use plugins differently.

I also have included some notebooks. I plan to use these to create a set of notebooks that will hopefully be helpful to people who want to better understand how Naomi works internally.

Related Issue

Integrate Naomi with LLM #435

Motivation and Context

With the rise of LLM chatbots, I wanted to give Naomi more capabilities for carrying on a conversation. Whether this is a good idea or not, I'm not sure. The book "It is better to be a good computer than a bad person" would argue that Naomi performs its function perfectly well and adding another layer of NLP is just complicating things. At the same time, having played with LLMs for a while now, they are fun to play with and definitely a step towards the Doctor Who K-9 or Star Wars C3P0 type entities. I'm interested to see what people think if they play with this. What works, what doesn't?

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

This change basically gives Naomi the ability to use an LLM in
crafting responses. The LLM is contacted over a web api, so any
OpenAI compatible LLM server should be fine. I am currently using
llama.cpp on a system with an nVidia GeForce RTX 3060 with 12 GB
VRAM. I am currently using the following model:
https://huggingface.co/mav23/Llama_3.2_1B_Intruct_Tool_Calling_V2-GGUF/blob/main/llama_3.2_1b_intruct_tool_calling_v2.Q4_K_M.gguf
This is only a 3B parameter model, so it should run fine on cards
with less VRAM. I have also tried running it on an Intel graphics
card on my laptop using SyCL which seems to work fine.

This is very much experimental. The basic idea is that I use the
existing TTI parser to activate a plugin, then pass the output
from that plugin to the LLM as a system message. This allows the
LLM to use current data and answer questions like "What time is
it?" or "Are we expecting rain on Thursday?" I'm also looking into
using function calling which would allow the LLM to be used as
an intent parser and might make plugin activation more integrated,
but that has its own set of problems, including that different
models define and use plugins differently.

I also have included some notebooks. I plan to use these to create
a set of notebooks that will hopefully be helpful to people who
want to better understand how Naomi works internally.
The news and hacker news plugins were using a test mic, which did
not have the new use_llm property.
self.say_i_do_not_understand()
handled = True
if not self.Continue:
quit()

Check warning

Code scanning / CodeQL

Use of exit() or quit() Warning

The 'quit' site.Quitter object may not exist if the 'site' module is not loaded or is modified.
@aaronchantrill aaronchantrill self-assigned this Dec 16, 2024
Add the ALPACA template used by SiliconMaid and Kunoichi models.
Added a conversation log in SQLite3, planning to move to JSON.
Started having the LLM use commands.
Added Zephyr template for working with TinyLlama.
Added support for FAISS intent parser.
for row in result:
self._messages.append({'role': row[1], 'content': row[2]})
conn.close()
self.template = Template(TEMPLATES[template]['template'])

Check warning

Code scanning / CodeQL

Jinja2 templating with autoescape=False Medium

Using jinja2 templates with autoescape=False can potentially allow XSS attacks.
self.template = Template(TEMPLATES[template]['template'])
self.eot_markers = TEMPLATES[template]['eot_markers']
self.emoji_filter = re.compile("["
U"\U0001F600-\U0001F64F" # emoticons

Check warning

Code scanning / CodeQL

Overly permissive regular expression range Medium

Suspicious character range that overlaps with \ufffd-\ufffd in the same character class.
self.template = Template(TEMPLATES[template]['template'])
self.eot_markers = TEMPLATES[template]['eot_markers']
self.emoji_filter = re.compile("["
U"\U0001F600-\U0001F64F" # emoticons

Check warning

Code scanning / CodeQL

Overly permissive regular expression range Medium

Suspicious character range that overlaps with \ufffd-\ufffd in the same character class.
self.template = Template(TEMPLATES[template]['template'])
self.eot_markers = TEMPLATES[template]['eot_markers']
self.emoji_filter = re.compile("["
U"\U0001F600-\U0001F64F" # emoticons

Check warning

Code scanning / CodeQL

Overly permissive regular expression range Medium

Suspicious character range that overlaps with \u2702-\u27b0 in the same character class, and overlaps with \ufffd-\ufffd in the same character class.
self.template = Template(TEMPLATES[template]['template'])
self.eot_markers = TEMPLATES[template]['eot_markers']
self.emoji_filter = re.compile("["
U"\U0001F600-\U0001F64F" # emoticons

Check warning

Code scanning / CodeQL

Overly permissive regular expression range Medium

Suspicious character range that overlaps with \ufffd-\ufffd in the same character class.
self.eot_markers = TEMPLATES[template]['eot_markers']
self.emoji_filter = re.compile("["
U"\U0001F600-\U0001F64F" # emoticons
U"\U0001F300-\U0001F5FF" # symbols & pictographs

Check warning

Code scanning / CodeQL

Overly permissive regular expression range Medium

Suspicious character range that overlaps with \ufffd-\ufffd in the same character class.
else:
self.actions_queue.appendleft(lambda: self.tts(phrase))
if "[shutdown]" in phrase.lower() or "[shut down]" in phrase.lower():
self.actions_queue.appendleft(lambda: self.shutdown())

Check notice

Code scanning / CodeQL

Unnecessary lambda Note

This 'lambda' is just a simple wrapper around a callable object. Use that object directly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Integrate Naomi with LLM

1 participant