-
Notifications
You must be signed in to change notification settings - Fork 170
feat(chat): replace keyword-based search with Lucene query generation #3025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Change intent detection from extracting keywords to generating proper Lucene query strings. This enables more sophisticated search capabilities including phrase matching, field boosting, and boolean operators. Key changes: - Update IntentDetectionResult to use query string instead of keywords list - Improve JSON parsing with Jackson ObjectMapper and regex fallback - Add thread-safe message handling in ChatSession with CopyOnWriteArrayList - Migrate session cleanup to TimeoutManager for better resource management - Enhance URL sanitization in chat.js for security - Fix state cleanup when starting new chat sessions Co-Authored-By: Claude Opus 4.5 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR replaces keyword-based search with Lucene query generation to enable more powerful search capabilities in the chat system. The changes include updating the IntentDetectionResult API to use query strings instead of keyword lists, improving JSON parsing reliability with Jackson, and adding thread-safe message handling in ChatSession.
Changes:
- Migrated from keyword extraction to Lucene query generation for search and FAQ intents
- Enhanced JSON parsing with Jackson ObjectMapper and regex fallback for robustness
- Implemented thread-safe message operations in
ChatSessionusingCopyOnWriteArrayListand synchronization
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/main/java/org/codelibs/fess/chat/IntentDetectionResult.java | Changed API from keyword list to Lucene query string |
| src/main/java/org/codelibs/fess/chat/ChatClient.java | Updated intent detection prompt for Lucene syntax and added Jackson-based JSON parsing |
| src/main/java/org/codelibs/fess/entity/ChatSession.java | Added thread-safe message handling and removed Serializable |
| src/main/java/org/codelibs/fess/chat/ChatSessionManager.java | Migrated from ScheduledExecutorService to TimeoutManager |
| src/main/webapp/js/chat.js | Improved URL sanitization and fixed state cleanup in newChat() |
| src/test/java/org/codelibs/fess/chat/IntentDetectionResultTest.java | Updated tests for query-based API |
| src/test/java/org/codelibs/fess/entity/ChatSessionTest.java | Updated test assertion for defensive copy behavior |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| final List<ChatMessage> trimmed = new ArrayList<>(messages.subList(messages.size() - maxMessages, messages.size())); | ||
| messages.clear(); | ||
| messages.addAll(trimmed); |
Copilot
AI
Jan 18, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trimHistory implementation creates an intermediate ArrayList, clears the messages list, then adds back the trimmed messages. This is inefficient for CopyOnWriteArrayList as it triggers multiple copy operations. Consider using messages.subList(0, toRemove).clear() approach which is more efficient, or directly creating a new CopyOnWriteArrayList with the trimmed sublist and assigning it.
| } | ||
| } | ||
| // Block URLs that look like protocol:// (unknown protocols) | ||
| if (/^[a-z][a-z0-9+.-]*:/i.test(trimmedUrl)) { |
Copilot
AI
Jan 18, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex pattern for detecting unknown protocols uses the case-insensitive flag 'i' but starts with [a-z] lowercase. This creates ambiguous behavior. Either remove the 'i' flag since protocol names should be lowercase, or change the character class to [a-zA-Z] for clarity.
| if (/^[a-z][a-z0-9+.-]*:/i.test(trimmedUrl)) { | |
| if (/^[a-z][a-z0-9+.-]*:/.test(trimmedUrl)) { |
Summary
Changes Made
keywords(List) toquery(String) for Lucene query supportstripCodeFences()helper for handling markdown-wrapped JSON responsesCopyOnWriteArrayListand synchronizationSerializableimplementation (not needed for session management)ScheduledExecutorServicetoTimeoutManagerfor cleanup tasksTesting
IntentDetectionResultTestto verify new query-based APIChatSessionTestfor thread-safe message handlingBreaking Changes
IntentDetectionResult.getKeywords()replaced withgetQuery()IntentDetectionResult.search(List<String>, String)now takes(String query, String reasoning)🤖 Generated with Claude Code