Skip to content

Conversation

@marevol
Copy link
Contributor

@marevol marevol commented Jan 18, 2026

Summary

  • Replace keyword extraction with Lucene query string generation for more powerful search capabilities
  • Improve JSON parsing reliability with Jackson ObjectMapper and regex fallback
  • Add thread-safe message handling in ChatSession for concurrent access safety

Changes Made

  • IntentDetectionResult: Changed from keywords (List) to query (String) for Lucene query support
  • ChatClient:
    • Updated intent detection prompt to generate Lucene queries with proper syntax (phrase matching, field boosting, boolean operators)
    • Added Jackson-based JSON parsing with fallback to regex for robustness
    • Added stripCodeFences() helper for handling markdown-wrapped JSON responses
  • ChatSession:
    • Added thread-safe message operations using CopyOnWriteArrayList and synchronization
    • Removed Serializable implementation (not needed for session management)
  • ChatSessionManager: Migrated from ScheduledExecutorService to TimeoutManager for cleanup tasks
  • chat.js:
    • Enhanced URL sanitization to properly handle relative URLs and block dangerous protocols
    • Fixed state cleanup when starting new chat sessions (close EventSource, reset processing state)

Testing

  • Updated IntentDetectionResultTest to verify new query-based API
  • Updated ChatSessionTest for thread-safe message handling

Breaking Changes

  • IntentDetectionResult.getKeywords() replaced with getQuery()
  • IntentDetectionResult.search(List<String>, String) now takes (String query, String reasoning)

🤖 Generated with Claude Code

Change intent detection from extracting keywords to generating proper
Lucene query strings. This enables more sophisticated search capabilities
including phrase matching, field boosting, and boolean operators.

Key changes:
- Update IntentDetectionResult to use query string instead of keywords list
- Improve JSON parsing with Jackson ObjectMapper and regex fallback
- Add thread-safe message handling in ChatSession with CopyOnWriteArrayList
- Migrate session cleanup to TimeoutManager for better resource management
- Enhance URL sanitization in chat.js for security
- Fix state cleanup when starting new chat sessions

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@marevol marevol requested a review from Copilot January 18, 2026 08:24
@marevol marevol self-assigned this Jan 18, 2026
@marevol marevol added this to the 15.5.0 milestone Jan 18, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces keyword-based search with Lucene query generation to enable more powerful search capabilities in the chat system. The changes include updating the IntentDetectionResult API to use query strings instead of keyword lists, improving JSON parsing reliability with Jackson, and adding thread-safe message handling in ChatSession.

Changes:

  • Migrated from keyword extraction to Lucene query generation for search and FAQ intents
  • Enhanced JSON parsing with Jackson ObjectMapper and regex fallback for robustness
  • Implemented thread-safe message operations in ChatSession using CopyOnWriteArrayList and synchronization

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/main/java/org/codelibs/fess/chat/IntentDetectionResult.java Changed API from keyword list to Lucene query string
src/main/java/org/codelibs/fess/chat/ChatClient.java Updated intent detection prompt for Lucene syntax and added Jackson-based JSON parsing
src/main/java/org/codelibs/fess/entity/ChatSession.java Added thread-safe message handling and removed Serializable
src/main/java/org/codelibs/fess/chat/ChatSessionManager.java Migrated from ScheduledExecutorService to TimeoutManager
src/main/webapp/js/chat.js Improved URL sanitization and fixed state cleanup in newChat()
src/test/java/org/codelibs/fess/chat/IntentDetectionResultTest.java Updated tests for query-based API
src/test/java/org/codelibs/fess/entity/ChatSessionTest.java Updated test assertion for defensive copy behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +243 to +245
final List<ChatMessage> trimmed = new ArrayList<>(messages.subList(messages.size() - maxMessages, messages.size()));
messages.clear();
messages.addAll(trimmed);
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trimHistory implementation creates an intermediate ArrayList, clears the messages list, then adds back the trimmed messages. This is inefficient for CopyOnWriteArrayList as it triggers multiple copy operations. Consider using messages.subList(0, toRemove).clear() approach which is more efficient, or directly creating a new CopyOnWriteArrayList with the trimmed sublist and assigning it.

Copilot uses AI. Check for mistakes.
}
}
// Block URLs that look like protocol:// (unknown protocols)
if (/^[a-z][a-z0-9+.-]*:/i.test(trimmedUrl)) {
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern for detecting unknown protocols uses the case-insensitive flag 'i' but starts with [a-z] lowercase. This creates ambiguous behavior. Either remove the 'i' flag since protocol names should be lowercase, or change the character class to [a-zA-Z] for clarity.

Suggested change
if (/^[a-z][a-z0-9+.-]*:/i.test(trimmedUrl)) {
if (/^[a-z][a-z0-9+.-]*:/.test(trimmedUrl)) {

Copilot uses AI. Check for mistakes.
@marevol marevol merged commit 3fde0be into master Jan 18, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants