-
Notifications
You must be signed in to change notification settings - Fork 9
feat: LLM image input support #1106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
samples/helloworld-agent/src/main/java/com/example/application/ImageOutRawResponseAgent.java
Outdated
Show resolved
Hide resolved
samples/helloworld-agent/src/main/java/com/example/application/ImageOutBase64ResponseAgent.java
Outdated
Show resolved
Hide resolved
| */ | ||
| void addInteraction( | ||
| String sessionId, | ||
| SessionMessage.CompoundUserMessage userMessage, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially, I used CompoundUserMessage for all interactions, text-only and multimodal, but I think most of the time, the user message will be text-based so it's worth optimizing it and supporting both options. Especially that we need to be backward compatible with events anyway.
| } | ||
|
|
||
| // keeping UserMessage instead of CompoundUserMessage for compaction | ||
| public record CompactionCmd(UserMessage userMessage, AiMessage aiMessage, long sequenceNumber) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that supporting only text-based UserMessage for compaction makes more sense than CompoundUserMessage.
|
|
||
| record TextMessageContent(String text) implements MessageContent {} | ||
|
|
||
| record ImageUriMessageContent( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yet another representation of the same thing, but this time for persistence, so String instead of URI, maybe I should also have dedicated DetailLevel, currently I'm reusing it.
| .stream() | ||
| .map(content -> | ||
| switch (content) { | ||
| case SessionMessage.MessageContent.ImageUriMessageContent __ -> ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be sth else than ignoring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be text?
image from {url}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
Ok, SDK is ready for the first round. SessionMemoryEntity deserves more attention. My next steps are to test it and add some documentation for the new feature. |
| * @param uri The URI pointing to the image | ||
| * @param detailLevel The level of detail for image processing | ||
| */ | ||
| record ImageUriMessageContent(URI uri, ImageMessageContent.DetailLevel detailLevel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue with URI is that it can be hacked to send base64 bytes in the form of data:image/jpeg;base64,/9j/4AAQSkZJRgABAQ..., which then raises the "large payloads" issues again. Perhaps the public API should only allow URLs for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe URL in the runtime SPI as well 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, indeed it should be something that is retrieved from the URL/URI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to URL, but only on the public SDK level. I could change the SPI as well, unless we think that it's better to keep it more flexible. On the other hand, supporting URIs with base64 makes less sense than creating a dedicated ImageBytesMessageContent and being more explicit. Is there a place for URIs for some custom protocol that we could later use to fetch/store blobs?
patriknw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good
akka-javasdk-testkit/src/main/java/akka/javasdk/testkit/TestModelProvider.java
Outdated
Show resolved
Hide resolved
akka-javasdk-testkit/src/main/java/akka/javasdk/testkit/TestModelProvider.java
Outdated
Show resolved
Hide resolved
| imageContent.image().url().toURL(), | ||
| toDetailLevel(imageContent.detailLevel())); | ||
| } catch (MalformedURLException e) { | ||
| throw new RuntimeException("Can't transform to URL", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include the url in the error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| .userMessage( | ||
| UserMessage.from( | ||
| MessageContent.TextMessageContent.from("testing"), | ||
| MessageContent.ImageMessageContent.from("https://example.com"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be ImageMessageContent.fromUrl to make it more clear?
we might have fromBytes and fromBase64 later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, changed others as well to keep it consistent.
| .stream() | ||
| .map(content -> | ||
| switch (content) { | ||
| case SessionMessage.MessageContent.ImageUriMessageContent __ -> ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be text?
image from {url}
more examples separation of multimodal response types updating changes from the runtime supporting only images in the user message instance of docs SDK IT test docs URI -> URL addressing PR comments reverting version
6eae418 to
2b321f8
Compare
| unwrapFailedCompletion(), | ||
| system, | ||
| materializer) | ||
| handlerFactory.partialInstancePerRequest(serviceFactory, description.name, unwrapFailedCompletion(), system) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related, but required after dependencies bump
patriknw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
refs: https://github.com/lightbend/akka-runtime/issues/4421