-
Notifications
You must be signed in to change notification settings - Fork 833
fix(openai): add streaming support for responses.create() api #3437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds a ResponseStream proxy that wraps OpenAI streaming responses to start and manage telemetry spans, accumulates streaming chunks into final output, parses final response data and updates traced attributes; updates a span-attribute key in shared module and adds multiple streaming test cassettes and tests. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Wrapper as Responses Wrapper
participant OpenAI as OpenAI API
participant RespStream as ResponseStream (proxy)
participant Tracer
User->>Wrapper: responses.create(stream=True)
Wrapper->>OpenAI: POST /v1/responses (stream)
OpenAI-->>Wrapper: Stream / AsyncStream
Wrapper->>RespStream: wrap stream + attach tracer/span
Wrapper-->>User: return RespStream proxy
loop user iterates stream
User->>RespStream: next() / async for chunk
RespStream->>RespStream: accumulate chunk, update output_text
end
RespStream->>RespStream: parse final response envelope
RespStream->>Tracer: set span attributes, record usage/reasoning
RespStream->>Tracer: end span (finish)
RespStream-->>User: iteration completes / raises on error
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 4bcffb8 in 58 seconds. Click for details.
- Reviewed
58lines of code in1files - Skipped
0files when reviewing. - Skipped posting
1draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. test_streaming_fix.py:1
- Draft comment:
The entire test file has been removed. This test provided comprehensive streaming telemetry validation. Ensure that equivalent tests exist in the project before removal. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
Workflow ID: wflow_3AWi1Wq3T3sguyIB
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Changes requested ❌
Reviewed everything up to 9ba78df in 3 minutes and 0 seconds. Click for details.
- Reviewed
1262lines of code in5files - Skipped
0files when reviewing. - Skipped posting
3draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:864
- Draft comment:
Potential duplicate accumulation: In _process_chunk, the code appends 'chunk.delta' if it exists and then separately checks if 'chunk.delta' has a 'text' attribute and appends it as well. If both conditions are met, the delta text might be added twice. Consider restructuring the conditionals to ensure each response chunk is added only once. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50% The comment raises a concern about duplicate accumulation. Looking at line 870, it doesself._output_text += chunk.delta- this would be appending the delta object itself to a string, which seems wrong. Then line 878 doesself._output_text += chunk.delta.text. If a chunk has type "response.output_text.delta" AND has a delta attribute with a text attribute, then: 1. Line 870 would append chunk.delta (the object) to the string 2. Line 878 would append chunk.delta.text (the text value) This does seem like it could cause issues. However, I need to consider: line 870 appendschunk.deltadirectly - if delta is a string, this would work. If delta is an object, appending it to a string would convert it to a string representation. Line 878 specifically accesseschunk.delta.text. The comment identifies a real potential issue: if chunk.type is "response.output_text.delta" and chunk.delta exists and chunk.delta.text exists, both blocks would execute. This is a valid concern about the code logic. I might be missing the actual structure of the chunk objects from the OpenAI API. Perhaps chunk.delta is always a string when chunk.type is "response.output_text.delta", and chunk.delta.text only exists in other cases. The "Fallback" comment on line 875 suggests these are meant to handle different scenarios, not overlap. While the fallback comment suggests different scenarios, the code structure doesn't prevent both from executing. Without seeing the actual OpenAI API response structure, I cannot definitively say whether this is a bug or intentional. The comment raises a valid concern about the code structure that could lead to duplicate accumulation if both conditions are met. The comment identifies a legitimate code quality concern about potential duplicate accumulation. However, without access to the OpenAI API documentation or response structure, I cannot be certain this is actually a bug versus intentional handling of different response formats. The comment is speculative ("might be added twice") rather than definitive.
2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:810
- Draft comment:
Using del for cleanup may be unreliable due to the unpredictable timing of garbage collection. Consider favoring explicit context management (e.g. using the with-statement) to ensure the span cleanup is executed promptly. - Reason this comment was not posted:
Comment looked like it was already resolved.
3. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:136
- Draft comment:
The global 'responses' dictionary is used to store traced data without explicit synchronization. In high concurrency scenarios, this might lead to race conditions. Consider protecting this shared resource with a lock or using a thread‐safe construct. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
Workflow ID: wflow_AfHt9ivMYqW0XpW3
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
| # Handle response events based on type | ||
| if hasattr(chunk, "type"): | ||
| # Handle text delta events | ||
| if chunk.type == "response.output_text.delta": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider accumulating streaming text into a list and then joining the list once (e.g. using ''.join(list)) rather than performing repeated string concatenation. This can improve performance when processing many small chunks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 9e7db44 in 1 minute and 41 seconds. Click for details.
- Reviewed
287lines of code in2files - Skipped
0files when reviewing. - Skipped posting
3draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:435
- Draft comment:
Cosmetic change: Inline comments (e.g. 'Create a span for the streaming response') were removed. Consider retaining a brief high‑level comment to clarify the purpose of this wrapper for future maintainability. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:868
- Draft comment:
The _process_chunk method checks for both 'delta' and 'delta.text'. This may cause duplicate text appending if both conditions are met. Consider clarifying (or refactoring) the logic to ensure it handles differing chunk structures without unintended duplication. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
3. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:9
- Draft comment:
Test function definitions have been reformatted into multi‑line parameter lists, which improves readability. No issues identified. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
Workflow ID: wflow_BHQoK302a0MZynlO
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1)
802-871: Accumulate deltas in a list before joining.String-appending every token chunk is O(n²) on long streams. Let’s collect chunks in a list and join once during cleanup to keep streaming performant. This echoes ellipsis-dev’s earlier suggestion.
Apply this diff:
- self._complete_response_data = None - self._output_text = "" + self._complete_response_data = None + self._output_chunks: list[str] = [] @@ - if hasattr(chunk, "delta") and chunk.delta: - self._output_text += chunk.delta + if hasattr(chunk, "delta") and chunk.delta: + self._output_chunks.append(chunk.delta) @@ - if hasattr(chunk.delta, "text") and chunk.delta.text: - self._output_text += chunk.delta.text + if hasattr(chunk.delta, "text") and chunk.delta.text: + self._output_chunks.append(chunk.delta.text) @@ - self._traced_data.output_text = self._output_text + self._traced_data.output_text = "".join(self._output_chunks) @@ - if self._span and self._span.is_recording(): + if self._output_chunks and not self._traced_data.output_text: + self._traced_data.output_text = "".join(self._output_chunks) + if self._span and self._span.is_recording():
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py(4 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming.yaml(1 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_async.yaml(1 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py(8 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.pypackages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py
**/cassettes/**/*.{yaml,yml,json}
📄 CodeRabbit inference engine (CLAUDE.md)
Never commit secrets or PII in VCR cassettes; scrub sensitive data
Files:
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_async.yamlpackages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming.yaml
🧬 Code graph analysis (2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (2)
_ensure_cleanup(786-821)_process_complete_response(740-783)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (2)
_handle_exception(151-158)dont_throw(132-160)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py (2)
packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (2)
InMemorySpanExporter(22-61)get_finished_spans(40-43)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (1)
is_reasoning_supported(25-30)
🪛 Ruff (0.14.3)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
905-905: Do not catch blind exception: Exception
(BLE001)
942-942: Do not catch blind exception: Exception
(BLE001)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py
10-10: Unused function argument: instrument_legacy
(ARG001)
31-31: Unused function argument: instrument_legacy
(ARG001)
86-86: Unused function argument: instrument_legacy
(ARG001)
158-158: Unused function argument: instrument_legacy
(ARG001)
186-186: Unused function argument: instrument_legacy
(ARG001)
215-215: Unused function argument: instrument_legacy
(ARG001)
247-247: Unused function argument: instrument_legacy
(ARG001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Lint
- GitHub Check: Test Packages (3.10)
- GitHub Check: Build Packages (3.11)
...lemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 31ca1c5 in 54 seconds. Click for details.
- Reviewed
84lines of code in2files - Skipped
0files when reviewing. - Skipped posting
3draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:195
- Draft comment:
Good use of hasattr for checking both new and fallback attribute names for usage tokens. This ensures backward compatibility with older semconv versions. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply praises the use ofhasattrwithout offering any constructive criticism or questions.
2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:788
- Draft comment:
Refactored use of self._request_kwargs in initializing TracedData enhances consistency by ensuring input, instructions, tools, and model values are correctly sourced. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
3. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:242
- Draft comment:
Enhanced streaming test assertions now verify span attributes for response model, prompt, and completion details. This improves test coverage and ensures telemetry data is captured as expected. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative, describing what the change does without providing any actionable feedback or suggestions. It doesn't ask for any specific confirmation or suggest improvements.
Workflow ID: wflow_01U9G1ZFBsCcJgkv
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 9930056 in 1 minute and 0 seconds. Click for details.
- Reviewed
178lines of code in1files - Skipped
0files when reviewing. - Skipped posting
2draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_with_content.yaml:40
- Draft comment:
Consider using a YAML block literal (using |) for the multi‐line 'string' content to improve readability. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
2. packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_with_content.yaml:61
- Draft comment:
Review the whitespace and newlines in the 'delta' event values to ensure they are intentional and correctly concatenated. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
Workflow ID: wflow_Amji3SfqnYzUjX5d
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed a17bdbb in 41 seconds. Click for details.
- Reviewed
44lines of code in1files - Skipped
0files when reviewing. - Skipped posting
2draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:194
- Draft comment:
Removed inline comments in the token usage block. Ensure context for fallback attributes remains clear. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:586
- Draft comment:
Removed redundant inline comments in the async streaming block; code clarity is maintained. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
Workflow ID: wflow_NgggD683fVnefbcc
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 05d8fab in 2 minutes and 13 seconds. Click for details.
- Reviewed
320lines of code in4files - Skipped
0files when reviewing. - Skipped posting
4draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:221
- Draft comment:
Consider using the SpanAttributes constants directly rather than wrapping them in an f-string (e.g. use SpanAttributes.LLM_REQUEST_REASONING_SUMMARY instead of f"{SpanAttributes.LLM_REQUEST_REASONING_SUMMARY}"). This avoids unnecessary string conversion. - Reason this comment was not posted:
Comment was on unchanged code.
2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:230
- Draft comment:
Review the defaulting of reasoning attributes: using 'traced_response.request_reasoning_summary or ()' may result in a tuple when no value is present. Since downstream tests expect a JSON–serialized string, consider consistently serializing these attributes (e.g. via json.dumps) to maintain type consistency. - Reason this comment was not posted:
Comment was on unchanged code.
3. packages/opentelemetry-instrumentation-openai/tests/conftest.py:175
- Draft comment:
The new 'instrument_legacy_with_content' fixture cleanly sets the TRACELOOP_TRACE_CONTENT environment variable. This is clear and follows best practices for test isolation. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply praises the code for following best practices, which violates the rule against making purely informative comments.
4. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:277
- Draft comment:
The new test 'test_responses_streaming_with_content' effectively verifies content tracing by asserting that both prompt and completion attributes are captured. For future enhancement, consider testing edge cases like empty streaming deltas or handling of large output segments (e.g. using list join for string concatenation) to optimize performance, though current usage appears acceptable. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This comment has two parts: 1) suggesting additional edge case tests (empty deltas, large segments), and 2) suggesting a performance optimization (using list join instead of +=). The first part is asking for more test coverage, which is speculative and not pointing to a clear bug. The second part is a performance optimization suggestion, but this is a test file, not production code, so performance is not critical. The string concatenation pattern used here is standard in tests and perfectly acceptable. The comment doesn't identify any actual bug or issue with the current implementation - it's just suggesting "nice to have" improvements. According to the rules, comments should not be speculative ("Consider testing edge cases...") and should clearly identify a code change that's required. Could the performance optimization be important even in tests? Could there be a real issue with how empty deltas are handled that would cause the test to fail? Maybe the comment is trying to prevent a real bug. Even if performance could theoretically matter, this is a simple test with a short response ("What is 2+2?"), so the performance concern is not relevant here. The code already handles empty deltas correctly with theif item.delta:check on line 293. The comment is purely speculative about potential improvements, not identifying an actual issue. This comment should be deleted. It's making speculative suggestions about edge cases and performance optimizations rather than identifying a clear bug or required change. The code works correctly as written, and the suggestions are "nice to have" rather than necessary fixes.
Workflow ID: wflow_V6Mw3lCdmrdtELTD
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 810f34e in 2 minutes and 6 seconds. Click for details.
- Reviewed
124lines of code in2files - Skipped
0files when reviewing. - Skipped posting
6draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:817
- Draft comment:
Consider returning the result of wrapped.enter if it differs from self. In some context managers, enter returns an object different from self; if that behavior is desired, adjust accordingly. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:822
- Draft comment:
Good use of a try-finally block in exit to ensure the underlying exit method is always called. This helps propagate suppression decisions from the wrapped object. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative, as it praises the use of a try-finally block without suggesting any changes or improvements. It doesn't align with the rules for useful comments in a pull request review.
3. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:835
- Draft comment:
Nice addition of async context manager support. The async aenter correctly awaits the underlying aenter (if present). - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative, as it only praises the addition of async context manager support without providing any actionable feedback or suggestions. It does not align with the rules for useful comments.
4. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:841
- Draft comment:
Async aexit properly awaits the underlying aexit and returns its suppression status. This ensures consistent async cleanup. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply states that something is done correctly without offering any guidance or questions.
5. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:312
- Draft comment:
The test 'test_responses_streaming_with_context_manager' correctly uses the context manager to consume the streaming response and verifies telemetry spans. Consider adding tests to cover exception scenarios within the context. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This comment is suggesting that the PR author add more tests to cover exception scenarios. This is not pointing out a bug or issue with the code that was added - it's suggesting additional test coverage. According to the rules, comments should only be kept if there is "clearly a code change required" and we should "Do NOT comment unless there is clearly a code change required." This is a suggestion for enhancement/additional coverage, not a required fix. The test as written appears to be valid and functional. The comment is essentially asking the author to do more work beyond what they've already done, which doesn't meet the bar for keeping the comment. Could this be pointing out that the context manager pattern specifically requires exception testing to ensure proper cleanup? Perhaps there's a real concern about resource leaks if exceptions occur within the context manager block, which would be a legitimate issue to address. While exception handling in context managers is important, the comment doesn't identify a specific bug or missing functionality in the code that was added. It's a general suggestion for additional test coverage. If there was a specific exception handling bug in the implementation being tested, that would be different, but this is just asking for more comprehensive testing without evidence of a problem. The existing test validates the happy path, which is appropriate for a new feature test. This comment should be deleted. It's suggesting additional test coverage rather than identifying a specific issue with the code changes. The comment doesn't point to a bug or required fix - it's asking the author to add more tests, which violates the rule that comments should only be made when there is clearly a code change required.
6. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:340
- Draft comment:
The async context manager test 'test_responses_streaming_async_with_context_manager' is well implemented and confirms that telemetry spans are emitted. Testing error paths in async scenarios might further improve coverage. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This comment is suggesting that the test could be improved by adding error path testing. However, this is not pointing out a bug or issue with the code that was added. It's a suggestion for additional test coverage that could be added in the future. According to the rules, comments should only be kept if they clearly identify a code change that is required. This is more of a "nice to have" suggestion rather than identifying a problem with the current implementation. The test as written appears to be valid and tests the happy path for async context manager usage. Suggesting additional test coverage is not the same as identifying a bug or required change. Could this be pointing out that the test is incomplete or insufficient? Perhaps error handling is critical for async context managers and the test is inadequate without it. However, I don't see evidence that the existing test is wrong or that error testing is mandatory for this PR. While error testing might be valuable, the comment doesn't identify any specific issue with the code that was added. It's a general suggestion for improvement rather than identifying a required change. The test appears to be complete for its stated purpose of testing the happy path of async streaming with context managers. This comment should be deleted. It's a suggestion for additional test coverage rather than identifying a required code change. The rules explicitly state not to make comments unless there is clearly a code change required, and this is more of an optional enhancement suggestion.
Workflow ID: wflow_9nYABQnTpq4epFUs
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 3a594b7 in 2 minutes and 18 seconds. Click for details.
- Reviewed
416lines of code in2files - Skipped
0files when reviewing. - Skipped posting
3draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_async_with_context_manager.yaml:21
- Draft comment:
The x-stainless-async header here is set to '- async:asyncio', which is non-standard compared to the typical boolean string seen in the sync cassette. Verify that the instrumentation correctly interprets this array value for async behavior. - Reason this comment was not posted:
Confidence changes required:50%<= threshold50%None
2. packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_async_with_context_manager.yaml:160
- Draft comment:
The cassette includes several dynamic HTTP headers (e.g., CF-RAY and Set-Cookie). Ensure these values are normalized or sanitized in tests to avoid potential flakiness. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment is asking the author to "ensure" something is done, which falls into the category of verification requests that the rules say to avoid. The comment doesn't point to a specific bug or issue - it's speculative about "potential flakiness". Cassette files are typically recorded fixtures that capture real HTTP responses, and the dynamic headers are part of the actual response. The comment doesn't provide evidence that these headers are actually causing problems, nor does it suggest a specific code change. It's more of a general best practice suggestion rather than identifying a concrete issue with the change. According to the rules, speculative comments like "potential flakiness" should be removed unless there's definite evidence of an issue. Could this actually be a legitimate concern? If these cassette files are used for exact matching in tests, dynamic values could indeed cause issues. However, I don't see evidence in the diff that this is causing a problem, and cassette libraries typically handle this automatically or the test framework would have configuration for it elsewhere. While the concern about dynamic headers could be valid in theory, the comment is speculative ("potential flakiness") and asks the author to "ensure" something rather than pointing to a concrete issue. The rules explicitly state not to make speculative comments or ask authors to verify/ensure things. If this were a real problem, it would likely be caught during test execution. This comment should be deleted. It's speculative about "potential flakiness" without evidence of an actual problem, and it asks the author to "ensure" something is done, which violates the rule against verification requests. Cassette files are typically generated fixtures, and if dynamic headers were an issue, it would be caught by the test framework.
3. packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_with_context_manager.yaml:137
- Draft comment:
The aggregated output text (in response.output_text.done) contains newline characters and multiline formatting. Ensure that the streaming telemetry correctly concatenates these delta chunks as expected. - Reason this comment was not posted:
Confidence changes required:50%<= threshold50%None
Workflow ID: wflow_iIKBL7njLTDBPP4b
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (2)
228-246: Simplify unnecessary f-string formatting.The f-string formatting on lines 232, 238, and 244 is redundant since
SpanAttributes.LLM_REQUEST_REASONING_SUMMARYand similar attributes are already strings. You can pass them directly to_set_span_attribute.Apply this diff:
- f"{SpanAttributes.LLM_REQUEST_REASONING_SUMMARY}", + SpanAttributes.LLM_REQUEST_REASONING_SUMMARY,- f"{SpanAttributes.LLM_REQUEST_REASONING_EFFORT}", + SpanAttributes.LLM_REQUEST_REASONING_EFFORT,- f"{SpanAttributes.LLM_RESPONSE_REASONING_EFFORT}", + SpanAttributes.LLM_RESPONSE_REASONING_EFFORT,
890-905: Consider list accumulation for better performance.Repeated string concatenation on lines 895 and 901 creates new string objects for each chunk. For many small chunks, accumulating into a list and joining once would be more efficient.
Apply this diff:
+ # Initialize list for text accumulation in __init__ + self._output_text_chunks = [] + def _process_chunk(self, chunk): """Process a streaming chunk""" if hasattr(chunk, "type"): if chunk.type == "response.output_text.delta": if hasattr(chunk, "delta") and chunk.delta: - self._output_text += chunk.delta + self._output_text_chunks.append(chunk.delta) elif chunk.type == "response.completed" and hasattr(chunk, "response"): self._complete_response_data = chunk.response if hasattr(chunk, "delta"): if hasattr(chunk.delta, "text") and chunk.delta.text: - self._output_text += chunk.delta.text + self._output_text_chunks.append(chunk.delta.text) if hasattr(chunk, "response") and chunk.response: self._complete_response_data = chunk.responseThen in
_process_complete_response(line 919), replace:- self._traced_data.output_text = self._output_text + self._traced_data.output_text = ''.join(self._output_text_chunks)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py(6 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_async_with_context_manager.yaml(1 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_with_context_manager.yaml(1 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py(7 hunks)
✅ Files skipped from review due to trivial changes (2)
- packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_async_with_context_manager.yaml
- packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_with_context_manager.yaml
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.pypackages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py
🧬 Code graph analysis (2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (3)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
SpanAttributes(64-245)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (2)
_ensure_cleanup(786-821)_process_complete_response(740-783)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (2)
_handle_exception(151-158)dont_throw(132-160)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py (3)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (1)
is_reasoning_supported(25-30)packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (3)
export(45-51)InMemorySpanExporter(22-61)get_finished_spans(40-43)packages/opentelemetry-instrumentation-openai/tests/conftest.py (4)
instrument_legacy(136-151)openai_client(42-43)async_openai_client(82-83)instrument_legacy_with_content(175-181)
🪛 Ruff (0.14.3)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
936-936: Do not catch blind exception: Exception
(BLE001)
973-973: Do not catch blind exception: Exception
(BLE001)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py
10-10: Unused function argument: instrument_legacy
(ARG001)
31-31: Unused function argument: instrument_legacy
(ARG001)
86-86: Unused function argument: instrument_legacy
(ARG001)
158-158: Unused function argument: instrument_legacy
(ARG001)
186-186: Unused function argument: instrument_legacy
(ARG001)
215-215: Unused function argument: instrument_legacy
(ARG001)
249-249: Unused function argument: instrument_legacy
(ARG001)
279-279: Unused function argument: instrument_legacy_with_content
(ARG001)
314-314: Unused function argument: instrument_legacy
(ARG001)
342-342: Unused function argument: instrument_legacy
(ARG001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Lint
🔇 Additional comments (11)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (6)
4-4: LGTM!The threading and wrapt.ObjectProxy imports are appropriate for implementing thread-safe cleanup and transparent proxying of streaming responses.
Also applies to: 8-8
195-206: LGTM!The conditional attribute setting provides proper backward compatibility by checking for the newer
GEN_AI_USAGE_CACHE_READ_INPUT_TOKENSattribute first before falling back to the legacyGEN_AI_USAGE_INPUT_TOKENS_CACHED.
444-457: LGTM!The wrappers correctly start a span and wrap streaming responses in
ResponseStreamfor both synchronous and asynchronous cases, delegating span lifecycle management to the proxy.Also applies to: 586-599
764-809: LGTM!The
ResponseStreamclass properly extendsObjectProxyto transparently proxy streaming responses while capturing telemetry. The initialization correctly sets up state including thread-safe cleanup tracking.
811-888: LGTM!The context manager and iterator implementations properly delegate to the wrapped stream while ensuring telemetry cleanup runs. Both synchronous and asynchronous variants correctly handle normal completion, early exit, and error scenarios. The previous concern about context-manager semantics has been addressed.
906-974: LGTM!The cleanup methods implement proper thread-safe, idempotent cleanup with appropriate guards. The broad exception handling on lines 936 and 973 is intentional and correct for cleanup paths, ensuring spans are always finalized even when errors occur during telemetry processing.
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py (5)
3-3: LGTM!The
AsyncOpenAIimport is necessary for the new async streaming tests added in this PR.
152-211: LGTM!The reasoning tests properly use conditional skipping and validate that complex reasoning data is serialized as JSON strings, addressing issue #3350 about invalid type warnings.
213-244: LGTM!The sync streaming test correctly validates that streaming responses emit telemetry spans with proper attributes and that content is successfully streamed and accumulated.
246-275: LGTM!The async streaming tests correctly use async/await patterns and validate that both regular async streaming and async context-manager usage properly emit telemetry spans.
Also applies to: 339-366
277-337: LGTM!The content tracing and context manager tests validate important scenarios: that prompt/completion content is properly captured during streaming when content tracing is enabled, and that context manager usage (both sync and async) correctly finalizes spans.
| GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS, | ||
| usage.input_tokens_details.cached_tokens, | ||
| ) | ||
| if hasattr(GenAIAttributes, 'GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need it?
| ) | ||
|
|
||
| # Only set reasoning tokens if the attribute exists | ||
| if hasattr(SpanAttributes, 'LLM_USAGE_REASONING_TOKENS'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
| else getattr(tokens_details, "reasoning_tokens", None) | ||
| ) | ||
|
|
||
| # Only set reasoning tokens if the attribute exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant comment
| f"{SpanAttributes.LLM_REQUEST_REASONING_SUMMARY}", | ||
| traced_response.request_reasoning_summary or (), | ||
| ) | ||
| if hasattr(SpanAttributes, 'LLM_REQUEST_REASONING_EFFORT'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
| instrumentor.uninstrument() | ||
|
|
||
|
|
||
| @pytest.fixture(scope="function") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need this
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 974ae91 in 1 minute and 15 seconds. Click for details.
- Reviewed
227lines of code in3files - Skipped
0files when reviewing. - Skipped posting
4draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py:232
- Draft comment:
The attribute change from GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS to SpanAttributes.LLM_USAGE_CACHE_READ_INPUT_TOKENS looks intentional. Please double‐check that this naming is aligned with the rest of the telemetry attributes. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:392
- Draft comment:
In the sync wrapper, streaming responses are checked with isinstance(response, Stream). Confirm that this check fully covers the intended types. (AsyncStream is handled only in the async wrapper.) - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
3. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:785
- Draft comment:
In ResponseStream._process_chunk, the code checks for a delta in two separate blocks (one under a type check and another unconditional block). This may lead to duplication if chunk.delta is both truthy and has a 'text' attribute. Consider consolidating the logic to avoid appending the same text twice. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
4. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:800
- Draft comment:
The cleanup logic in ResponseStream (methods _ensure_cleanup and _handle_exception) uses a threading.Lock and _cleanup_completed flag to prevent double cleanup. It would be helpful to add inline comments clarifying the idempotency of cleanup operations for future maintainability. - Reason this comment was not posted:
Confidence changes required:30%<= threshold50%None
Workflow ID: wflow_lmRWaqU2unaicQaz
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed df7ca6e in 2 minutes and 18 seconds. Click for details.
- Reviewed
48lines of code in1files - Skipped
0files when reviewing. - Skipped posting
2draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py:289
- Draft comment:
Adding reasoning request attributes here. Consider verifying that using an empty tuple as a default (viareasoning_effort or ()) is the intended behavior and type—should it be a string/None instead? - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This comment is explicitly asking the author to "verify that... is the intended behavior" which directly violates the rule: "Do NOT ask the PR author to confirm their intention, to explain, to double-check things, to ensure the behavior is intended". The comment doesn't state that there IS a problem - it's speculative and asks for verification. Without seeing the definition of_set_span_attributeorSpanAttributes.LLM_REQUEST_REASONING_EFFORT, I cannot determine if using an empty tuple is actually wrong. The comment doesn't provide strong evidence that this is incorrect - it's just questioning the choice. Maybe the empty tuple is actually incorrect and would cause a runtime error or type mismatch in_set_span_attribute. Perhaps the reviewer has knowledge about what type this attribute should be, and an empty tuple would be problematic. Even if the empty tuple might be wrong, the comment doesn't state definitively that it IS wrong - it asks the author to verify. Without seeing clear evidence that this will cause an error or is definitively incorrect, and given that the comment is phrased as a verification request, it should be deleted per the rules. This comment should be deleted because it asks the author to "verify" their intention rather than stating a clear code issue. It's speculative ("should it be...?") rather than definitive, and there's no strong evidence provided that the empty tuple is incorrect.
2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py:327
- Draft comment:
In the _handle_response function, reasoning usage attributes are extracted from the response. Ensure that defaulting to 0 using(reasoning_tokens or 0)aligns with expected numeric types and that the extraction logic handles cases where 'completion_tokens_details' might not include a 'reasoning_tokens' field. - Reason this comment was not posted:
Comment looked like it was already resolved.
Workflow ID: wflow_uFQqg8kl8bsTVEtP
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Changes requested ❌
Reviewed b576554 in 2 minutes and 46 seconds. Click for details.
- Reviewed
150lines of code in1files - Skipped
0files when reviewing. - Skipped posting
4draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:127
- Draft comment:
Added new reasoning attributes to TracedData. Ensure these fields are documented and that their expected type (string) is maintained downstream. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This comment is asking the PR author to "ensure" documentation exists and type consistency is maintained. This is exactly the kind of comment the rules say to avoid - it starts with "Ensure that..." and doesn't point to a specific code issue. The comment doesn't identify any actual problem with the code - the types are already declared as Optional[str] in the pydantic model, and the code appears to handle them consistently. The comment is more of a reminder or suggestion to verify things rather than pointing out a concrete issue. According to the rules, comments that ask the author to "ensure" things or verify/confirm are not useful and should be deleted. Could there be a legitimate concern about documentation or type safety that I'm missing? Perhaps the comment is trying to highlight that these new fields need docstrings or that there's a type inconsistency somewhere in how they're used downstream? Even if documentation would be nice to have, the comment doesn't point to a specific missing docstring or documentation requirement. The types are clearly declared in the pydantic model, and I can see from the code that they're used consistently as strings throughout. The comment is asking the author to "ensure" things rather than identifying a concrete problem, which violates the rules. This comment should be deleted. It asks the PR author to "ensure" documentation and type consistency without identifying any specific code issue. This is exactly the type of non-actionable, verification-requesting comment that the rules say to avoid.
2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:476
- Draft comment:
Both request_reasoning_effort and response_reasoning_effort are sourced from the same 'effort' key in kwargs. Verify that this is intended, as they represent distinct attributes. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment is asking the PR author to "verify" something, which violates the rule: "Do NOT ask the PR author to confirm their intention, to explain, to double-check things, to ensure the behavior is intended, to make sure their change is tested, or similar." The comment doesn't definitively state there's a bug - it's speculative. Looking at the code context, this is in an exception handler wherekwargscontains the request parameters. It's plausible that in this error path, the response reasoning effort hasn't been set yet, so it falls back to the request's effort value. Without seeing the OpenAI API documentation or understanding the full context of when this code path is hit, I cannot definitively say this is wrong. The comment is asking for verification rather than stating a clear issue. However, it does seem suspicious that both request and response attributes would use the same source value. This could be a legitimate bug whereresponse_reasoning_effortshould come from a different source (likeexisting_dataor the actual response object). The pattern is repeated consistently across multiple locations, which could indicate either intentional design or a systematic error. While it does seem suspicious, the comment explicitly asks to "verify that this is intended" rather than stating "this is incorrect." This is a verification request, not a definitive bug report. The rules clearly state not to ask for confirmation. Additionally, this is in an exception handler path (line 450 showsexcept Exception as e:), so the behavior might be intentional - when an error occurs, there may not be a response yet, so using the request's effort value as a fallback could be correct. Without stronger evidence that this is definitely wrong, I should delete this comment. This comment should be deleted because it asks the PR author to "verify" the intention rather than identifying a definite issue. It's a speculative comment that violates the rule against asking for confirmation.
3. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:614
- Draft comment:
In the async wrapper, reasoning attributes are handled similarly. Ensure consistency with the sync version and consider similar fallback logic if needed. - Reason this comment was not posted:
Comment looked like it was already resolved.
4. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:790
- Draft comment:
The ResponseStream init assigns reasoning attributes from _request_kwargs. Confirm that these values (or their defaults) align with expectations across all code paths. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment is asking the PR author to confirm that the values align with expectations across all code paths, which is a form of asking for confirmation or verification. This violates the rule against asking the PR author to confirm their intention or ensure behavior is intended.
Workflow ID: wflow_PmONKpNByCMrnk5F
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
| SpanAttributes.LLM_USAGE_REASONING_TOKENS, | ||
| reasoning_tokens or 0, | ||
| f"{SpanAttributes.LLM_REQUEST_REASONING_SUMMARY}", | ||
| traced_response.request_reasoning_summary or (), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fallback values for reasoning attributes are set using or (). Consider using an empty string ('') to match the Optional[str] type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed e5c8eb8 in 1 minute and 6 seconds. Click for details.
- Reviewed
56lines of code in1files - Skipped
0files when reviewing. - Skipped posting
2draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:211
- Draft comment:
Removed conditional checks for telemetry reasoning attributes. Please verify that all SpanAttributes (e.g. LLM_USAGE_REASONING_TOKENS) are guaranteed to exist in supported versions so that no AttributeError is raised. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment is asking the PR author to verify that certain attributes are guaranteed to exist, which falls under asking the author to ensure behavior is intended or to double-check things. This violates the rules.
2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:221
- Draft comment:
The use of f-string formatting for attribute keys (e.g. f"{SpanAttributes.LLM_REQUEST_REASONING_SUMMARY}") may be unnecessary if the constants are already strings. Consider using the constant directly to simplify the code. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
Workflow ID: wflow_pu8eNVvZdbFErkSG
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed c1cfeb9 in 1 minute and 0 seconds. Click for details.
- Reviewed
21lines of code in1files - Skipped
0files when reviewing. - Skipped posting
1draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/tests/conftest.py:155
- Draft comment:
Removal of the instrument_legacy_with_content fixture looks intentional to consolidate legacy configurations into instrument_with_content. Ensure no tests depend on it and that legacy content tracing remains covered by existing fixtures. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
Workflow ID: wflow_6jWYRKeuwh7Ki3Q4
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 1b19abd in 47 seconds. Click for details.
- Reviewed
37lines of code in1files - Skipped
0files when reviewing. - Skipped posting
3draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:210
- Draft comment:
Minor formatting improvement: splitting the isinstance assert into multiple lines enhances readability. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:282
- Draft comment:
Removal of the unused 'instrument_legacy_with_content' fixture improves parameter consistency. Make sure this change doesn't affect tests expecting content tracing instrumentation. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
3. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:344
- Draft comment:
Reformatting the fixture parameters for async context-manager test enhances readability. The functional behavior remains intact. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
Workflow ID: wflow_01oRhqY1wRnciOCa
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py(6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
🧬 Code graph analysis (1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (4)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
SpanAttributes(64-245)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)
_set_span_attribute(30-37)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (2)
_ensure_cleanup(786-821)_process_complete_response(740-783)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (2)
_handle_exception(151-158)dont_throw(132-160)
🪛 Ruff (0.14.4)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
928-928: Do not catch blind exception: Exception
(BLE001)
965-965: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Lint
- GitHub Check: Test Packages (3.11)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Test Packages (3.12)
...lemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed f480d19 in 57 seconds. Click for details.
- Reviewed
41lines of code in1files - Skipped
0files when reviewing. - Skipped posting
3draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:280
- Draft comment:
Adding the 'instrument_legacy' fixture to the function signature (line 10) ensures that the instrumentation is correctly set up for streaming tests. Verify that this fixture is properly configured in the test environment. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment is asking the PR author to verify that the fixture is properly configured, which violates the rule against asking for confirmation or verification. It is not making a specific code suggestion or pointing out a specific issue that needs to be addressed.
2. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:337
- Draft comment:
The synchronous context manager test now includes assertions for both prompt and completion attributes (content, role). This thoroughly verifies that telemetry captures streaming details. Ensure that these attributes are consistently set by the instrumentation. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment is asking the author to ensure consistency in setting attributes, which is a form of asking for confirmation or verification. This violates the rule against asking the author to ensure or verify things. The comment does not provide a specific suggestion or point out a specific issue, making it not useful according to the guidelines.
3. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:369
- Draft comment:
The asynchronous context manager test now also asserts that prompt and completion telemetry attributes match the streaming output. This change covers the async streaming instrumentation and aligns its behavior with the sync variant. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
Workflow ID: wflow_fIN56P6L5dkjiVkS
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 6bf70a1 in 51 seconds. Click for details.
- Reviewed
27lines of code in1files - Skipped
0files when reviewing. - Skipped posting
2draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:253
- Draft comment:
Good use of a variable (input_text) to avoid duplicating the prompt string. This improves maintainability. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:276
- Draft comment:
New assertions for telemetry attributes (prompt and completion data) ensure that streaming spans capture expected values. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
Workflow ID: wflow_YPIXw1QLtu541FIS
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 272e106 in 1 minute and 7 seconds. Click for details.
- Reviewed
15lines of code in1files - Skipped
0files when reviewing. - Skipped posting
1draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:247
- Draft comment:
Good addition! These assertions verify that streaming responses capture both prompt and completion telemetry correctly. Consider adding custom assertion messages for clarity if a test fails. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
Workflow ID: wflow_EFD5bHnues0kkUBN
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 8d4d8dc in 1 minute and 16 seconds. Click for details.
- Reviewed
25lines of code in1files - Skipped
0files when reviewing. - Skipped posting
2draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:838
- Draft comment:
The new close() method correctly ensures cleanup by calling _ensure_cleanup() before delegating to the wrapped object's close. Consider adding a docstring to clarify its purpose and note that any exceptions from _ensure_cleanup() are intentionally suppressed. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:847
- Draft comment:
The async aclose() method properly awaits the underlying object's aclose() after calling _ensure_cleanup(). A brief docstring could help future maintainers understand its role in ensuring async cleanup of telemetry spans. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
Workflow ID: wflow_8x40e1am2wb3MOa0
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1)
219-233: Use empty string fallback for type consistency.Lines 222, 227, and 232 use
or ()as fallback values for reasoning attributes. Since these fields are typed asOptional[str]in TracedData, using an empty string''instead of an empty tuple would maintain type consistency.Apply this diff:
_set_span_attribute( span, f"{SpanAttributes.LLM_REQUEST_REASONING_SUMMARY}", - traced_response.request_reasoning_summary or (), + traced_response.request_reasoning_summary or '', ) _set_span_attribute( span, f"{SpanAttributes.LLM_REQUEST_REASONING_EFFORT}", - traced_response.request_reasoning_effort or (), + traced_response.request_reasoning_effort or '', ) _set_span_attribute( span, f"{SpanAttributes.LLM_RESPONSE_REASONING_EFFORT}", - traced_response.response_reasoning_effort or (), + traced_response.response_reasoning_effort or '', )
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1)
891-906: Consider using list accumulation for better performance.Lines 896 and 902 perform repeated string concatenation (
+=) in a loop when processing streaming chunks. For streams with many small chunks, this creates intermediate string objects and can impact performance. Consider accumulating deltas in a list and joining once at the end.Apply this diff:
def __init__( self, span, response, start_time=None, request_kwargs=None, tracer=None, traced_data=None, ): # ... existing init code ... self._complete_response_data = None - self._output_text = "" + self._output_text_parts = [] self._cleanup_completed = False self._cleanup_lock = threading.Lock() def _process_chunk(self, chunk): """Process a streaming chunk""" if hasattr(chunk, "type"): if chunk.type == "response.output_text.delta": if hasattr(chunk, "delta") and chunk.delta: - self._output_text += chunk.delta + self._output_text_parts.append(chunk.delta) elif chunk.type == "response.completed" and hasattr(chunk, "response"): self._complete_response_data = chunk.response if hasattr(chunk, "delta"): if hasattr(chunk.delta, "text") and chunk.delta.text: - self._output_text += chunk.delta.text + self._output_text_parts.append(chunk.delta.text) if hasattr(chunk, "response") and chunk.response: self._complete_response_data = chunk.response @dont_throw def _process_complete_response(self): """Process the complete response and emit span""" with self._cleanup_lock: if self._cleanup_completed: return try: if self._complete_response_data: parsed_response = parse_response(self._complete_response_data) self._traced_data.response_id = parsed_response.id self._traced_data.response_model = parsed_response.model - self._traced_data.output_text = self._output_text + self._traced_data.output_text = ''.join(self._output_text_parts)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py(5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
🧬 Code graph analysis (1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (3)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
SpanAttributes(64-245)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (2)
_ensure_cleanup(786-821)_process_complete_response(740-783)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (2)
_handle_exception(151-158)dont_throw(132-160)
🪛 Ruff (0.14.4)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
846-846: return inside finally blocks cause exceptions to be silenced
(B012)
853-853: return inside finally blocks cause exceptions to be silenced
(B012)
937-937: Do not catch blind exception: Exception
(BLE001)
974-974: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Lint
- GitHub Check: Test Packages (3.12)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Build Packages (3.11)
🔇 Additional comments (8)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (8)
4-4: LGTM: Essential imports for streaming support.The
threadingandwrapt.ObjectProxyimports are necessary for the ResponseStream implementation—threading provides locks for thread-safe cleanup, and ObjectProxy enables transparent stream wrapping.Also applies to: 8-8
197-197: LGTM: Correct attribute naming convention.The change from
GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENStoSpanAttributes.LLM_USAGE_CACHE_READ_INPUT_TOKENSaligns with the backward-compatible naming convention in the semantic conventions module.
431-444: LGTM: Streaming telemetry correctly implemented.The sync streaming path now starts a span before returning and wraps the Stream in ResponseStream, ensuring telemetry is captured throughout the streaming lifecycle. This correctly addresses the missing spans issue for streaming responses.
573-586: LGTM: Async streaming telemetry correctly implemented.The async streaming path mirrors the sync implementation, correctly handling both Stream and AsyncStream types and ensuring spans are started before returning the wrapped stream.
751-801: LGTM: Robust initialization and lifecycle management.The ResponseStream class initialization properly wraps the underlying stream, captures request context into TracedData, and establishes thread-safe cleanup tracking. The
__del__method provides a safety net for cleanup if the stream is garbage-collected without being fully consumed.
855-889: LGTM: Iterator methods correctly handle streaming lifecycle.The sync and async iterator implementations properly intercept iteration, process chunks, handle completion via StopIteration/StopAsyncIteration, and handle errors through the exception handler. The symmetric design ensures consistent behavior across sync and async paths.
908-942: LGTM: Robust completion processing with proper telemetry.The
_process_complete_responsemethod correctly handles thread-safe, idempotent completion processing. It parses the final response, updates TracedData with accumulated streaming content, stores it for potential reuse, and properly finalizes the span with all attributes. The broad exception handling is appropriate given the@dont_throwdecorator ensures telemetry failures don't break user code.
945-975: LGTM: Comprehensive exception handling and cleanup fallback.The
_handle_exceptionand_ensure_cleanupmethods provide robust error handling and cleanup guarantees. Both are thread-safe, idempotent, and decorated with@dont_throwto ensure telemetry failures never disrupt the user's streaming workflow. The_ensure_cleanupmethod serves as a fallback for edge cases where normal completion doesn't occur (e.g., early close, garbage collection).
| def close(self): | ||
| try: | ||
| self._ensure_cleanup() | ||
| finally: | ||
| if hasattr(self.__wrapped__, "close"): | ||
| return self.__wrapped__.close() | ||
|
|
||
| async def aclose(self): | ||
| try: | ||
| self._ensure_cleanup() | ||
| finally: | ||
| if hasattr(self.__wrapped__, "aclose"): | ||
| return await self.__wrapped__.aclose() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix return statements in finally blocks.
Lines 846 and 853 use return statements inside finally blocks, which silences any exceptions raised by _ensure_cleanup(). While _ensure_cleanup is decorated with @dont_throw, this pattern is still problematic because it masks the control flow.
Apply this diff:
def close(self):
+ result = None
try:
self._ensure_cleanup()
finally:
if hasattr(self.__wrapped__, "close"):
- return self.__wrapped__.close()
+ result = self.__wrapped__.close()
+ return result
async def aclose(self):
+ result = None
try:
self._ensure_cleanup()
finally:
if hasattr(self.__wrapped__, "aclose"):
- return await self.__wrapped__.aclose()
+ result = await self.__wrapped__.aclose()
+ return result📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def close(self): | |
| try: | |
| self._ensure_cleanup() | |
| finally: | |
| if hasattr(self.__wrapped__, "close"): | |
| return self.__wrapped__.close() | |
| async def aclose(self): | |
| try: | |
| self._ensure_cleanup() | |
| finally: | |
| if hasattr(self.__wrapped__, "aclose"): | |
| return await self.__wrapped__.aclose() | |
| def close(self): | |
| result = None | |
| try: | |
| self._ensure_cleanup() | |
| finally: | |
| if hasattr(self.__wrapped__, "close"): | |
| result = self.__wrapped__.close() | |
| return result | |
| async def aclose(self): | |
| result = None | |
| try: | |
| self._ensure_cleanup() | |
| finally: | |
| if hasattr(self.__wrapped__, "aclose"): | |
| result = await self.__wrapped__.aclose() | |
| return result |
🧰 Tools
🪛 Ruff (0.14.4)
846-846: return inside finally blocks cause exceptions to be silenced
(B012)
853-853: return inside finally blocks cause exceptions to be silenced
(B012)
Implements streaming instrumentation for OpenAI's responses.create() endpoint.
Previously, streaming responses were returned without any telemetry spans.
Continuing work of #3396, fixes #3395
Important
Adds streaming support for OpenAI's
responses.create()API with telemetry capture for both sync and async responses, including comprehensive tests.ResponseStreamclass inresponses_wrappers.pyto capture telemetry for streaming responses.responses_get_or_create_wrapper()andasync_responses_get_or_create_wrapper().response.output_text.deltaevents to accumulate output text.test_responses.py.GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENStoSpanAttributes.LLM_USAGE_CACHE_READ_INPUT_TOKENS.This description was created by
for 8d4d8dc. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
New Features
Tests
resolves TLP-1087