-
Notifications
You must be signed in to change notification settings - Fork 58
[Responses API] Implement parse_referenced_documents_from_responses_api #853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Responses API] Implement parse_referenced_documents_from_responses_api #853
Conversation
WalkthroughSerialize Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/app/endpoints/query_v2.py (1)
398-495: Refactor to address pipeline failures: too many locals, branches, and nested blocks.The linter reports multiple violations that block CI:
- Too many local variables (18/15)
- Too many branches (19/12)
- Too many nested blocks (7/5)
Also, remove
# pylint: disable=unused-argumenton line 399 sinceresponseis actually used.Extract helper functions to reduce complexity:
+def _parse_file_search_results( + results: list[Any], + documents: list[ReferencedDocument], + seen_docs: set[tuple[str | None, str | None]], +) -> None: + """Extract referenced documents from file_search_call results.""" + for result in results: + if isinstance(result, dict): + filename = result.get("filename") + attributes = result.get("attributes", {}) or {} + else: + filename = getattr(result, "filename", None) + attributes = getattr(result, "attributes", {}) or {} + + doc_url = ( + attributes.get("link") + or attributes.get("url") + or attributes.get("doc_url") + ) + final_url = doc_url if doc_url else None + + if (filename or doc_url) and (final_url, filename) not in seen_docs: + documents.append(ReferencedDocument(doc_url=final_url, doc_title=filename)) + seen_docs.add((final_url, filename)) + + +def _parse_message_annotations( + content: list[Any], + documents: list[ReferencedDocument], + seen_docs: set[tuple[str | None, str | None]], +) -> None: + """Extract referenced documents from message content annotations.""" + for part in content: + if isinstance(part, str): + continue + for annotation in getattr(part, "annotations", []) or []: + _process_annotation(annotation, documents, seen_docs) + + +def _process_annotation( + annotation: Any, + documents: list[ReferencedDocument], + seen_docs: set[tuple[str | None, str | None]], +) -> None: + """Process a single annotation and add to documents if applicable.""" + if isinstance(annotation, dict): + anno_type = annotation.get("type") + anno_url = annotation.get("url") + anno_title = annotation.get("title") or annotation.get("filename") + else: + anno_type = getattr(annotation, "type", None) + anno_url = getattr(annotation, "url", None) + anno_title = getattr(annotation, "title", None) or getattr( + annotation, "filename", None + ) + + final_url = anno_url if anno_url else None + + if anno_type == "url_citation" and (final_url, anno_title) not in seen_docs: + documents.append(ReferencedDocument(doc_url=final_url, doc_title=anno_title)) + seen_docs.add((final_url, anno_title)) + elif anno_type == "file_citation" and (None, anno_title) not in seen_docs: + documents.append(ReferencedDocument(doc_url=None, doc_title=anno_title)) + seen_docs.add((None, anno_title)) + + def parse_referenced_documents_from_responses_api( - response: OpenAIResponseObject, # pylint: disable=unused-argument + response: OpenAIResponseObject, ) -> list[ReferencedDocument]: - """ - Parse referenced documents from OpenAI Responses API response. - ... - """ + """Parse referenced documents from OpenAI Responses API response.""" documents: list[ReferencedDocument] = [] seen_docs: set[tuple[str | None, str | None]] = set() if not response.output: return documents for output_item in response.output: item_type = getattr(output_item, "type", None) - # 1. Parse from file_search_call results if item_type == "file_search_call": - results = getattr(output_item, "results", []) or [] - for result in results: - # Handle both object and dict access - ... + _parse_file_search_results( + getattr(output_item, "results", []) or [], + documents, + seen_docs, + ) - # 2. Parse from message content annotations elif item_type == "message": content = getattr(output_item, "content", None) if isinstance(content, list): - for part in content: - ... + _parse_message_annotations(content, documents, seen_docs) return documents
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/app/endpoints/query_v2.py(3 hunks)tests/unit/app/endpoints/test_query_v2.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use absolute imports for internal modules in LCS project (e.g.,from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) andOptional[Type]orType | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Useasync deffor I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor exceptions,Resolverfor strategy patterns,Interfacefor abstract base classes
Abstract classes must use ABC with@abstractmethoddecorators
Include complete type annotations for all class attributes in Python classes
Useimport loggingand module logger pattern with standard log levels: debug, info, warning, error
Files:
src/app/endpoints/query_v2.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/query_v2.py
src/**/{client,app/endpoints/**}.py
📄 CodeRabbit inference engine (CLAUDE.md)
Handle
APIConnectionErrorfrom Llama Stack in integration code
Files:
src/app/endpoints/query_v2.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage
Files:
tests/unit/app/endpoints/test_query_v2.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use
pytest-mockwith AsyncMock objects for mocking in tests
Files:
tests/unit/app/endpoints/test_query_v2.py
🧬 Code graph analysis (2)
src/app/endpoints/query_v2.py (1)
src/models/responses.py (1)
ReferencedDocument(326-336)
tests/unit/app/endpoints/test_query_v2.py (2)
src/models/requests.py (1)
QueryRequest(73-233)src/app/endpoints/query_v2.py (1)
retrieve_response(265-395)
🪛 GitHub Actions: Python linter
src/app/endpoints/query_v2.py
[error] 398-398: R0914: Too many local variables (18/15) (too-many-locals)
[error] 398-398: R0912: Too many branches (19/12) (too-many-branches)
[error] 417-417: R1702: Too many nested blocks (7/5) (too-many-nested-blocks)
[error] 417-417: R1702: Too many nested blocks (7/5) (too-many-nested-blocks)
tests/unit/app/endpoints/test_query_v2.py
[error] 721-721: R0914: Too many local variables (18/15) (too-many-locals)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build-pr
- GitHub Check: e2e_tests (ci)
- GitHub Check: e2e_tests (azure)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🔇 Additional comments (2)
src/app/endpoints/query_v2.py (1)
136-136: LGTM on JSON serialization change.Serializing
response_payloadto JSON string ensures consistent data format for storage inToolCallSummary.response.tests/unit/app/endpoints/test_query_v2.py (1)
201-209: LGTM on adding annotations attribute to mock parts.This prevents iteration errors when the code accesses
part.annotationsin the updated parsing logic.
26ed602 to
212b294
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
src/app/endpoints/query_v2.py (3)
398-400: Remove incorrectpylint: disable=unused-argumentcomment.The
responseparameter is used throughout the function body, so this disable comment is unnecessary and misleading.-def parse_referenced_documents_from_responses_api( - response: OpenAIResponseObject, # pylint: disable=unused-argument -) -> list[ReferencedDocument]: +def parse_referenced_documents_from_responses_api( + response: OpenAIResponseObject, +) -> list[ReferencedDocument]:
417-494: Refactor to address pipeline linting failures.The pipeline reports:
- Too many local variables (18/15)
- Too many nested blocks (7/5)
- Too many branches (19/12)
Consider extracting helper functions to reduce complexity:
+def _parse_file_search_results( + results: list[Any], + documents: list[ReferencedDocument], + seen_docs: set[tuple[str | None, str | None]], +) -> None: + """Extract documents from file_search_call results.""" + for result in results: + if isinstance(result, dict): + filename = result.get("filename") + attributes = result.get("attributes", {}) or {} + else: + filename = getattr(result, "filename", None) + attributes = getattr(result, "attributes", {}) or {} + + doc_url = ( + attributes.get("link") + or attributes.get("url") + or attributes.get("doc_url") + ) + + if filename or doc_url: + final_url = doc_url if doc_url else None + if (final_url, filename) not in seen_docs: + documents.append( + ReferencedDocument(doc_url=final_url, doc_title=filename) + ) + seen_docs.add((final_url, filename)) + + +def _parse_message_annotations( + content: list[Any], + documents: list[ReferencedDocument], + seen_docs: set[tuple[str | None, str | None]], +) -> None: + """Extract documents from message content annotations.""" + for part in content: + if isinstance(part, str): + continue + + annotations = getattr(part, "annotations", []) or [] + for annotation in annotations: + if isinstance(annotation, dict): + anno_type = annotation.get("type") + anno_url = annotation.get("url") + anno_title = annotation.get("title") or annotation.get("filename") + else: + anno_type = getattr(annotation, "type", None) + anno_url = getattr(annotation, "url", None) + anno_title = getattr(annotation, "title", None) or getattr( + annotation, "filename", None + ) + + if anno_type == "url_citation": + final_url = anno_url if anno_url else None + if (final_url, anno_title) not in seen_docs: + documents.append( + ReferencedDocument(doc_url=final_url, doc_title=anno_title) + ) + seen_docs.add((final_url, anno_title)) + elif anno_type == "file_citation": + if (None, anno_title) not in seen_docs: + documents.append( + ReferencedDocument(doc_url=None, doc_title=anno_title) + ) + seen_docs.add((None, anno_title))Then simplify the main function:
def parse_referenced_documents_from_responses_api( response: OpenAIResponseObject, ) -> list[ReferencedDocument]: """Parse referenced documents from OpenAI Responses API response.""" documents: list[ReferencedDocument] = [] seen_docs: set[tuple[str | None, str | None]] = set() if not response.output: return documents for output_item in response.output: item_type = getattr(output_item, "type", None) if item_type == "file_search_call": results = getattr(output_item, "results", []) or [] _parse_file_search_results(results, documents, seen_docs) elif item_type == "message": content = getattr(output_item, "content", None) if isinstance(content, list): _parse_message_annotations(content, documents, seen_docs) return documents
486-493: Consider handlingfile_citationwith URL if available.Currently,
file_citationalways setsdoc_url=None. If the annotation has a URL attribute (some implementations might include one), it would be discarded.elif anno_type == "file_citation": - if (None, anno_title) not in seen_docs: + final_url = anno_url if anno_url else None + if (final_url, anno_title) not in seen_docs: documents.append( ReferencedDocument( - doc_url=None, doc_title=anno_title + doc_url=final_url, doc_title=anno_title ) ) - seen_docs.add((None, anno_title)) + seen_docs.add((final_url, anno_title))
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/app/endpoints/query_v2.py(3 hunks)tests/unit/app/endpoints/test_query_v2.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/unit/app/endpoints/test_query_v2.py
🧰 Additional context used
📓 Path-based instructions (3)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use absolute imports for internal modules in LCS project (e.g.,from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) andOptional[Type]orType | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Useasync deffor I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor exceptions,Resolverfor strategy patterns,Interfacefor abstract base classes
Abstract classes must use ABC with@abstractmethoddecorators
Include complete type annotations for all class attributes in Python classes
Useimport loggingand module logger pattern with standard log levels: debug, info, warning, error
Files:
src/app/endpoints/query_v2.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/query_v2.py
src/**/{client,app/endpoints/**}.py
📄 CodeRabbit inference engine (CLAUDE.md)
Handle
APIConnectionErrorfrom Llama Stack in integration code
Files:
src/app/endpoints/query_v2.py
🧬 Code graph analysis (1)
src/app/endpoints/query_v2.py (1)
src/models/responses.py (1)
ReferencedDocument(326-336)
🪛 GitHub Actions: Python linter
src/app/endpoints/query_v2.py
[error] 398-398: pylint: Too many local variables (18/15) (R0914). Command: uv run pylint src tests.
[error] 417-417: pylint: Too many nested blocks (7/5) (R1702). Command: uv run pylint src tests.
[error] 398-398: pylint: Too many branches (19/12) (R0912). Command: uv run pylint src tests.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build-pr
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: e2e_tests (ci)
- GitHub Check: e2e_tests (azure)
🔇 Additional comments (2)
src/app/endpoints/query_v2.py (2)
3-3: LGTM!Import is correctly added and used for JSON serialization.
136-136: LGTM!JSON serialization ensures consistent string storage and avoids potential issues with object references.
212b294 to
bad689e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
src/utils/responses.py (3)
63-89: HandleattributesbeingNonein_parse_file_search_result.
attributes = getattr(result, "attributes", {})will leaveattributesasNoneif the attribute exists but isNone, andattributes.get(...)will then raise at runtime. Using anor {}guard would make this path more robust against partial/malformed tool responses.- else: - filename = getattr(result, "filename", None) - attributes = getattr(result, "attributes", {}) + else: + filename = getattr(result, "filename", None) + attributes = getattr(result, "attributes", {}) or {}
141-160: File search output parsing is sound; consider small clarity tweak.The loop over
resultsand the(filename or doc_url)guard correctly avoids emitting empty entries. With the suggestedattributes or {}change above, this helper is reasonably robust against partially populated results.
196-229: Public parser is cohesive; consider minor resilience improvement.The orchestration in
parse_referenced_documents_from_responses_apicleanly combines file_search and message-annotation sources with deduping by(doc_url, doc_title). One minor hardening you might consider is guarding access toresponse.outputviagetattrso accidental misuse with a dict-like response doesn’t raise:- if not response.output: - return documents - - for output_item in response.output: + output_items = getattr(response, "output", None) or [] + if not output_items: + return documents + + for output_item in output_items:tests/unit/app/endpoints/test_query_v2.py (1)
720-743: Good extraction of annotation-message setup into a helper.
_create_message_output_with_annotationsnicely centralizes the annotated message construction, reduces local variables in the test, and mirrors the shapes your parser expects (url_citation and file_citation with filename fallback).
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/app/endpoints/query_v2.py(3 hunks)src/utils/responses.py(2 hunks)tests/unit/app/endpoints/test_query_v2.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/app/endpoints/query_v2.py
🧰 Additional context used
📓 Path-based instructions (3)
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage
Files:
tests/unit/app/endpoints/test_query_v2.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use
pytest-mockwith AsyncMock objects for mocking in tests
Files:
tests/unit/app/endpoints/test_query_v2.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use absolute imports for internal modules in LCS project (e.g.,from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) andOptional[Type]orType | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Useasync deffor I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor exceptions,Resolverfor strategy patterns,Interfacefor abstract base classes
Abstract classes must use ABC with@abstractmethoddecorators
Include complete type annotations for all class attributes in Python classes
Useimport loggingand module logger pattern with standard log levels: debug, info, warning, error
Files:
src/utils/responses.py
🧬 Code graph analysis (2)
tests/unit/app/endpoints/test_query_v2.py (2)
src/models/requests.py (1)
QueryRequest(73-233)src/app/endpoints/query_v2.py (1)
retrieve_response(268-398)
src/utils/responses.py (1)
src/models/responses.py (1)
ReferencedDocument(326-336)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build-pr
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: e2e_tests (ci)
- GitHub Check: e2e_tests (azure)
🔇 Additional comments (4)
src/utils/responses.py (2)
92-115: Annotation parsing logic looks correct and flexible.The
_parse_annotationhelper correctly supports both dict and object shapes and sensibly falls back fromtitletofilenamefor file citations; this aligns with the test expectations.
162-194: Message annotation parsing matches spec and avoids false positives.
_parse_message_annotationssafely:
- ignores non-list
content,- skips pure-string parts,
- handles missing/empty
annotations,- distinguishes
url_citationvsfile_citation, normalizing empty URLs toNone.This aligns with the new tests and should not interfere with messages that have no annotations.
tests/unit/app/endpoints/test_query_v2.py (2)
201-205: Content parts setup is compatible with annotation parsing.Initializing
part1/part2withtextand explicitannotations = []ensures they behave as simple text parts without spurious citations, while remaining safe if the message-annotation parser inspectsannotations.
746-807: I'm unable to proceed with verification due to a repository clone failure. However, based on the review comment provided, I can identify the issue:The review comment includes a verification request with a pylint script that cannot be executed in the current environment. Since this verification step cannot be completed, I need to rewrite the comment to reflect this limitation.
The review-requested verification (pylint R0914 check) cannot be performed due to repository access limitations.
The substantive review feedback about the test coverage, order-independent assertions, and optional refactoring suggestion can be preserved, but the specific pylint verification request must be removed or reframed.
Referenced-documents test accurately exercises all supported sources.
The new
test_retrieve_response_parses_referenced_documents:
- Covers both message annotations (url_citation + file_citation) and
file_search_callresults (with and without URLs).- Asserts on
doc_title/doc_urlcombinations in an order-independent way.- Ensures four unique
ReferencedDocumententries, matching the dedupe-by-(url,title) semantics.One follow-up to watch: this refactor should also resolve the previous
too-many-localslint error; if that warning persists, consider similarly extracting the file_search output construction into a small helper.
Implements the function parse_referenced_documents_from_responses_api checking at the Response API output at: - file_search_call objects (filename and attributes) - annotations within messages content (type, url, title) - 2 type of annoations, url_citation and file_citation
bad689e to
e45a1f4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
tests/unit/app/endpoints/test_query_v2.py (1)
763-766: Consider testing object-based file search results.The test uses dict-based
results, which exercisesresult.get()paths. The parsing logic also handles object-based results viagetattr(). Consider adding a test case with mock objects to ensure both code paths are covered.Example additional test data using mock objects:
# Object-based result for broader coverage result_mock = mocker.Mock() result_mock.filename = "object_file.pdf" result_mock.attributes = {"url": "http://example.com/object"} output_item_2.results = [result_mock, ...]src/utils/responses.py (2)
118-143: Good: ValidationError handling prevents crashes on malformed URLs.The
try/except ValidationErrorblock correctly handles invalid URLs by skipping the document, addressing the concern from the past review. Consider adding debug logging for skipped invalid URLs to aid troubleshooting.try: validated_url = AnyUrl(doc_url) # type: ignore[arg-type] except ValidationError: - # Skip documents with invalid URLs + # Skip documents with invalid URLs - log for debugging + import logging + logging.getLogger(__name__).debug( + "Skipping document with invalid URL: %s", doc_url + ) returnNote: If adding logging, move the import to the module level to follow the
logger = logging.getLogger(__name__)pattern from coding guidelines.
201-234: Consider guarding againstNoneresponse.The function accesses
response.outputdirectly on line 221. Ifresponseitself isNone, this will raiseAttributeError. Consider adding a guard or documenting thatresponsemust not beNone.def parse_referenced_documents_from_responses_api( response: Any, ) -> list[ReferencedDocument]: ... documents: list[ReferencedDocument] = [] # Use a set to track unique documents by (doc_url, doc_title) tuple seen_docs: set[tuple[str | None, str | None]] = set() - if not response.output: + if not response or not response.output: return documentsAlternatively, if callers always pass a valid response, document this precondition in the docstring.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/app/endpoints/query_v2.py(3 hunks)src/utils/responses.py(2 hunks)tests/unit/app/endpoints/test_query_v2.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage
Files:
tests/unit/app/endpoints/test_query_v2.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use
pytest-mockwith AsyncMock objects for mocking in tests
Files:
tests/unit/app/endpoints/test_query_v2.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use absolute imports for internal modules in LCS project (e.g.,from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) andOptional[Type]orType | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Useasync deffor I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor exceptions,Resolverfor strategy patterns,Interfacefor abstract base classes
Abstract classes must use ABC with@abstractmethoddecorators
Include complete type annotations for all class attributes in Python classes
Useimport loggingand module logger pattern with standard log levels: debug, info, warning, error
Files:
src/app/endpoints/query_v2.pysrc/utils/responses.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/query_v2.py
src/**/{client,app/endpoints/**}.py
📄 CodeRabbit inference engine (CLAUDE.md)
Handle
APIConnectionErrorfrom Llama Stack in integration code
Files:
src/app/endpoints/query_v2.py
🧬 Code graph analysis (3)
tests/unit/app/endpoints/test_query_v2.py (1)
src/models/requests.py (1)
QueryRequest(73-233)
src/app/endpoints/query_v2.py (1)
src/utils/responses.py (2)
extract_text_from_response_output_item(10-60)parse_referenced_documents_from_responses_api(201-234)
src/utils/responses.py (1)
src/models/responses.py (1)
ReferencedDocument(326-336)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build-pr
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: e2e_tests (azure)
- GitHub Check: e2e_tests (ci)
🔇 Additional comments (11)
src/app/endpoints/query_v2.py (3)
3-3: LGTM! Import changes are clean and follow project conventions.The new imports use absolute paths as required by coding guidelines, and the
jsonmodule import is correctly placed with standard library imports.Also applies to: 42-45
135-140: LGTM! JSON serialization ensures consistent response format.Storing
response_payloadas a JSON string (viajson.dumps) ensures the file search results are serialized consistently for downstream consumers. The conditional correctly preservesNonewhen no results are present.
386-392: Good refactoring: Delegation to centralized parsing utility.Moving document parsing logic to
parse_referenced_documents_from_responses_apipromotes code reuse and maintainability. The integration is clean.tests/unit/app/endpoints/test_query_v2.py (3)
201-209: Good fix: Adding empty annotations prevents iteration errors.Adding
annotations = []to mock parts ensures the parsing logic doesn't fail when iterating over annotations in_parse_message_annotations. This defensive setup is appropriate.
720-743: Good refactoring: Helper function reduces test complexity.Extracting mock setup into
_create_message_output_with_annotationsaddresses the previous review's "too many local variables" concern and improves test readability.
746-807: Comprehensive test for referenced document parsing.The test validates:
- URL citations from message annotations
- File citations from message annotations
- File search results with URL attributes
- File search results without URL attributes
The refactoring keeps the local variable count at 14, under the pylint limit of 15.
src/utils/responses.py (5)
5-7: LGTM! Imports follow project conventions.Using absolute imports for internal modules (
models.responses) and standard Pydantic imports is correct per coding guidelines.
63-89: LGTM! Robust handling of file search results.The function correctly handles:
- Both dict and object access patterns
Noneattributes viaor {}fallback (line 81)- Empty string URL normalization to
None
92-115: LGTM! Clean annotation parsing with appropriate fallbacks.The function handles both dict and object annotations, with sensible fallback from
titletofilenamefor the document title.
146-164: LGTM! Defensive iteration with proper null handling.The
or []fallback on line 159 preventsTypeErrorwhenresultsisNone, and the condition on line 163 ensures documents have at least one identifier.
167-198: LGTM! Proper handling of annotation types.The function correctly differentiates between:
url_citation: Extracts URL and titlefile_citation: Uses only title (URL set toNone)The empty string normalization on line 195 ensures consistent
Nonehandling.
Description
Implements the function parse_referenced_documents_from_responses_api checking at the Response API output at:
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.