Skip to content

Conversation

@aludwiko
Copy link
Contributor

@github-actions github-actions bot added documentation documentation related java-sdk labels Oct 31, 2025
*/
void addInteraction(
String sessionId,
SessionMessage.CompoundUserMessage userMessage,
Copy link
Contributor Author

@aludwiko aludwiko Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, I used CompoundUserMessage for all interactions, text-only and multimodal, but I think most of the time, the user message will be text-based so it's worth optimizing it and supporting both options. Especially that we need to be backward compatible with events anyway.

}

// keeping UserMessage instead of CompoundUserMessage for compaction
public record CompactionCmd(UserMessage userMessage, AiMessage aiMessage, long sequenceNumber) {}
Copy link
Contributor Author

@aludwiko aludwiko Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that supporting only text-based UserMessage for compaction makes more sense than CompoundUserMessage.


record TextMessageContent(String text) implements MessageContent {}

record ImageUriMessageContent(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet another representation of the same thing, but this time for persistence, so String instead of URI, maybe I should also have dedicated DetailLevel, currently I'm reusing it.

.stream()
.map(content ->
switch (content) {
case SessionMessage.MessageContent.ImageUriMessageContent __ -> "";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be sth else than ignoring?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be text?

image from {url}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@aludwiko aludwiko changed the title feat: LLM image input/output support feat: LLM image input support Nov 5, 2025
@aludwiko
Copy link
Contributor Author

aludwiko commented Nov 5, 2025

Ok, SDK is ready for the first round. SessionMemoryEntity deserves more attention. My next steps are to test it and add some documentation for the new feature.

* @param uri The URI pointing to the image
* @param detailLevel The level of detail for image processing
*/
record ImageUriMessageContent(URI uri, ImageMessageContent.DetailLevel detailLevel)
Copy link
Contributor Author

@aludwiko aludwiko Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with URI is that it can be hacked to send base64 bytes in the form of ..., which then raises the "large payloads" issues again. Perhaps the public API should only allow URLs for now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe URL in the runtime SPI as well 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, indeed it should be something that is retrieved from the URL/URI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to URL, but only on the public SDK level. I could change the SPI as well, unless we think that it's better to keep it more flexible. On the other hand, supporting URIs with base64 makes less sense than creating a dedicated ImageBytesMessageContent and being more explicit. Is there a place for URIs for some custom protocol that we could later use to fetch/store blobs?

Copy link
Contributor

@patriknw patriknw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good

imageContent.image().url().toURL(),
toDetailLevel(imageContent.detailLevel()));
} catch (MalformedURLException e) {
throw new RuntimeException("Can't transform to URL", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include the url in the error message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

.userMessage(
UserMessage.from(
MessageContent.TextMessageContent.from("testing"),
MessageContent.ImageMessageContent.from("https://example.com")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be ImageMessageContent.fromUrl to make it more clear?
we might have fromBytes and fromBase64 later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, changed others as well to keep it consistent.

.stream()
.map(content ->
switch (content) {
case SessionMessage.MessageContent.ImageUriMessageContent __ -> "";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be text?

image from {url}

more examples

separation of multimodal response types

updating changes from the runtime

supporting only images in the user message

instance of

docs

SDK IT test

docs

URI -> URL

addressing PR comments

reverting version
@aludwiko aludwiko force-pushed the 4421-support-llm-image-inputoutput branch from 6eae418 to 2b321f8 Compare November 7, 2025 09:43
unwrapFailedCompletion(),
system,
materializer)
handlerFactory.partialInstancePerRequest(serviceFactory, description.name, unwrapFailedCompletion(), system)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related, but required after dependencies bump

Copy link
Contributor

@patriknw patriknw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation documentation related java-sdk

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants