fix(openai): add streaming support for responses.create() api #3437

galkleinman · 2025-11-10T13:47:19Z

Implements streaming instrumentation for OpenAI's responses.create() endpoint.
Previously, streaming responses were returned without any telemetry spans.

Continuing work of #3396, fixes #3395

Add ResponseStream wrapper class to capture streaming telemetry
Handle both sync and async streaming responses
Process response.output_text.delta events to accumulate output
Emit spans with proper attributes after stream completion
Add comprehensive tests for sync and async streaming

Important

Adds streaming support for OpenAI's responses.create() API with telemetry capture for both sync and async responses, including comprehensive tests.

Behavior:
- Adds ResponseStream class in responses_wrappers.py to capture telemetry for streaming responses.
- Supports both sync and async streaming in responses_get_or_create_wrapper() and async_responses_get_or_create_wrapper().
- Processes response.output_text.delta events to accumulate output text.
- Emits spans with attributes after stream completion.
Tests:
- Adds tests for sync and async streaming in test_responses.py.
- Includes context manager usage tests for streaming.
Misc:
- Renames GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS to SpanAttributes.LLM_USAGE_CACHE_READ_INPUT_TOKENS.

^{This description was created by}^{for 8d4d8dc. You can customize this summary. It will automatically update as commits are pushed.}

Summary by CodeRabbit

New Features
- Improved streaming response telemetry with enhanced span lifecycle handling for both sync and async streams.
- More robust handling and serialization of reasoning data during streaming.
Tests
- Added comprehensive streaming test coverage (sync, async, and context-manager scenarios) validating assembled output, telemetry, and reasoning behavior.

resolves TLP-1087

coderabbitai · 2025-11-10T13:47:31Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds a ResponseStream proxy that wraps OpenAI streaming responses to start and manage telemetry spans, accumulates streaming chunks into final output, parses final response data and updates traced attributes; updates a span-attribute key in shared module and adds multiple streaming test cassettes and tests.

Changes

Cohort / File(s)	Summary
Streaming Telemetry Implementation `packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py`	Adds `ResponseStream(ObjectProxy)` and imports (`threading`, `wrapt.ObjectProxy`). Wraps sync/async Stream objects in `ResponseStream` which starts spans, accumulates chunks, builds output_text, parses final response, sets traced attributes, finishes/cleans up spans and handles errors.
Shared Module Updates `packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py`	Replaces `GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS` with `SpanAttributes.LLM_USAGE_CACHE_READ_INPUT_TOKENS` in `_set_response_attributes`.
Test Cassettes `packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming*.yaml`	Adds multiple streaming cassette fixtures: `test_responses_streaming.yaml`, `test_responses_streaming_async.yaml`, `test_responses_streaming_with_content.yaml`, `test_responses_streaming_with_context_manager.yaml`, `test_responses_streaming_async_with_context_manager.yaml` recording full SSE-style streaming lifecycles and final assembled responses.
Test Coverage `packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py`	Adds/updates streaming tests (sync/async, with content, context-manager variants), introduces reasoning serialization assertions, imports `AsyncOpenAI`, and standardizes skip decorators. Validates assembled full_text, span attributes, and reasoning serialization behavior.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Wrapper as Responses Wrapper
    participant OpenAI as OpenAI API
    participant RespStream as ResponseStream (proxy)
    participant Tracer

    User->>Wrapper: responses.create(stream=True)
    Wrapper->>OpenAI: POST /v1/responses (stream)
    OpenAI-->>Wrapper: Stream / AsyncStream
    Wrapper->>RespStream: wrap stream + attach tracer/span
    Wrapper-->>User: return RespStream proxy

    loop user iterates stream
        User->>RespStream: next() / async for chunk
        RespStream->>RespStream: accumulate chunk, update output_text
    end

    RespStream->>RespStream: parse final response envelope
    RespStream->>Tracer: set span attributes, record usage/reasoning
    RespStream->>Tracer: end span (finish)
    RespStream-->>User: iteration completes / raises on error

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Review ResponseStream proxy: thread-safety, iteration semantics (sync/async), and correct span lifecycle handling.
Verify attribute key change in shared module is consistent across codebase.
Validate the new cassettes reflect realistic SSE event ordering and tests assert expected span/content outcomes.

Possibly related PRs

feat(openai): add reasoning attributes #3336 — Modifies reasoning-related propagation and span attributes in responses_wrappers.py; closely related to this PR's response/span handling.
feat(instrumentation): updated GenAI attributes to use OTel's #3138 — Broad attribute key migrations (LLM/GEN_AI) overlapping with this PR's SpanAttributes changes.
feat(instrumentation): remove param enrich_token_usage and simplify token calculation #3205 — Changes to token-usage handling in shared instrumentation code, relevant to attribute and usage reporting edits.

Suggested reviewers

nirga

Poem

🐰 A stream once flowed without a trace,
No spans were born to mark its pace.
I wrapped the flow in gentle thread,
Chunks stitched up and traces fed.
Now Luna's tale leaves trails of grace. ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding streaming support to the responses.create() API, which is the core objective of this PR.
Linked Issues check	✅ Passed	The PR implements streaming support for responses.create() API, emits telemetry spans for streaming responses, handles response.output_text.delta accumulation, and supports both sync/async streaming as required by #3395 and TLP-1087.
Out of Scope Changes check	✅ Passed	All changes align with the scope: ResponseStream wrapper for streaming telemetry, streaming support in responses_wrappers.py, test coverage for streaming scenarios, and attribute name updates from GenAIAttributes to SpanAttributes.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch gk/fix-responses-streaming

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 4bcffb8 in 58 seconds. Click for details.

Reviewed 58 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. test_streaming_fix.py:1

Draft comment:
The entire test file has been removed. This test provided comprehensive streaming telemetry validation. Ensure that equivalent tests exist in the project before removal.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_3AWi1Wq3T3sguyIB

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Caution

Changes requested ❌

Reviewed everything up to 9ba78df in 3 minutes and 0 seconds. Click for details.

Reviewed 1262 lines of code in 5 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:864

Draft comment:
Potential duplicate accumulation: In _process_chunk, the code appends 'chunk.delta' if it exists and then separately checks if 'chunk.delta' has a 'text' attribute and appends it as well. If both conditions are met, the delta text might be added twice. Consider restructuring the conditionals to ensure each response chunk is added only once.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50% The comment raises a concern about duplicate accumulation. Looking at line 870, it does self._output_text += chunk.delta - this would be appending the delta object itself to a string, which seems wrong. Then line 878 does self._output_text += chunk.delta.text. If a chunk has type "response.output_text.delta" AND has a delta attribute with a text attribute, then: 1. Line 870 would append chunk.delta (the object) to the string 2. Line 878 would append chunk.delta.text (the text value) This does seem like it could cause issues. However, I need to consider: line 870 appends chunk.delta directly - if delta is a string, this would work. If delta is an object, appending it to a string would convert it to a string representation. Line 878 specifically accesses chunk.delta.text. The comment identifies a real potential issue: if chunk.type is "response.output_text.delta" and chunk.delta exists and chunk.delta.text exists, both blocks would execute. This is a valid concern about the code logic. I might be missing the actual structure of the chunk objects from the OpenAI API. Perhaps chunk.delta is always a string when chunk.type is "response.output_text.delta", and chunk.delta.text only exists in other cases. The "Fallback" comment on line 875 suggests these are meant to handle different scenarios, not overlap. While the fallback comment suggests different scenarios, the code structure doesn't prevent both from executing. Without seeing the actual OpenAI API response structure, I cannot definitively say whether this is a bug or intentional. The comment raises a valid concern about the code structure that could lead to duplicate accumulation if both conditions are met. The comment identifies a legitimate code quality concern about potential duplicate accumulation. However, without access to the OpenAI API documentation or response structure, I cannot be certain this is actually a bug versus intentional handling of different response formats. The comment is speculative ("might be added twice") rather than definitive.

2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:810

Draft comment:
Using del for cleanup may be unreliable due to the unpredictable timing of garbage collection. Consider favoring explicit context management (e.g. using the with-statement) to ensure the span cleanup is executed promptly.
Reason this comment was not posted:
Comment looked like it was already resolved.

3. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:136

Draft comment:
The global 'responses' dictionary is used to store traced data without explicit synchronization. In high concurrency scenarios, this might lead to race conditions. Consider protecting this shared resource with a lock or using a thread‐safe construct.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_AfHt9ivMYqW0XpW3

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev · 2025-11-10T13:50:24Z

...lemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

+        # Handle response events based on type
+        if hasattr(chunk, "type"):
+            # Handle text delta events
+            if chunk.type == "response.output_text.delta":


Consider accumulating streaming text into a list and then joining the list once (e.g. using ''.join(list)) rather than performing repeated string concatenation. This can improve performance when processing many small chunks.

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 9e7db44 in 1 minute and 41 seconds. Click for details.

Reviewed 287 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:435

Draft comment:
Cosmetic change: Inline comments (e.g. 'Create a span for the streaming response') were removed. Consider retaining a brief high‑level comment to clarify the purpose of this wrapper for future maintainability.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:868

Draft comment:
The _process_chunk method checks for both 'delta' and 'delta.text'. This may cause duplicate text appending if both conditions are met. Consider clarifying (or refactoring) the logic to ensure it handles differing chunk structures without unintended duplication.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

3. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:9

Draft comment:
Test function definitions have been reformatted into multi‑line parameter lists, which improves readability. No issues identified.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_BHQoK302a0MZynlO

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1)

802-871: Accumulate deltas in a list before joining.

String-appending every token chunk is O(n²) on long streams. Let’s collect chunks in a list and join once during cleanup to keep streaming performant. This echoes ellipsis-dev’s earlier suggestion.

Apply this diff:

-        self._complete_response_data = None
-        self._output_text = ""
+        self._complete_response_data = None
+        self._output_chunks: list[str] = []
@@
-                if hasattr(chunk, "delta") and chunk.delta:
-                    self._output_text += chunk.delta
+                if hasattr(chunk, "delta") and chunk.delta:
+                    self._output_chunks.append(chunk.delta)
@@
-            if hasattr(chunk.delta, "text") and chunk.delta.text:
-                self._output_text += chunk.delta.text
+            if hasattr(chunk.delta, "text") and chunk.delta.text:
+                self._output_chunks.append(chunk.delta.text)
@@
-                    self._traced_data.output_text = self._output_text
+                    self._traced_data.output_text = "".join(self._output_chunks)
@@
-                if self._span and self._span.is_recording():
+                if self._output_chunks and not self._traced_data.output_text:
+                    self._traced_data.output_text = "".join(self._output_chunks)
+                if self._span and self._span.is_recording():

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3468985 and 9e7db44.

📒 Files selected for processing (4)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (4 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming.yaml (1 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_async.yaml (1 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py (8 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py

**/cassettes/**/*.{yaml,yml,json}

📄 CodeRabbit inference engine (CLAUDE.md)

Never commit secrets or PII in VCR cassettes; scrub sensitive data

Files:

packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_async.yaml
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming.yaml

🧬 Code graph analysis (2)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (2)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (2)

_ensure_cleanup (786-821)

_process_complete_response (740-783)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (2)

_handle_exception (151-158)

dont_throw (132-160)

packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py (2)

packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (2)

InMemorySpanExporter (22-61)

get_finished_spans (40-43)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (1)

is_reasoning_supported (25-30)

🪛 Ruff (0.14.3)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

905-905: Do not catch blind exception: Exception

(BLE001)

942-942: Do not catch blind exception: Exception

(BLE001)

packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py

10-10: Unused function argument: instrument_legacy

(ARG001)

31-31: Unused function argument: instrument_legacy

(ARG001)

86-86: Unused function argument: instrument_legacy

(ARG001)

158-158: Unused function argument: instrument_legacy

(ARG001)

186-186: Unused function argument: instrument_legacy

(ARG001)

215-215: Unused function argument: instrument_legacy

(ARG001)

247-247: Unused function argument: instrument_legacy

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Test Packages (3.11)
GitHub Check: Test Packages (3.12)
GitHub Check: Lint
GitHub Check: Test Packages (3.10)
GitHub Check: Build Packages (3.11)

...lemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 31ca1c5 in 54 seconds. Click for details.

Reviewed 84 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:195

Draft comment:
Good use of hasattr for checking both new and fallback attribute names for usage tokens. This ensures backward compatibility with older semconv versions.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply praises the use of hasattr without offering any constructive criticism or questions.

2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:788

Draft comment:
Refactored use of self._request_kwargs in initializing TracedData enhances consistency by ensuring input, instructions, tools, and model values are correctly sourced.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

3. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:242

Draft comment:
Enhanced streaming test assertions now verify span attributes for response model, prompt, and completion details. This improves test coverage and ensures telemetry data is captured as expected.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, describing what the change does without providing any actionable feedback or suggestions. It doesn't ask for any specific confirmation or suggest improvements.

Workflow ID: wflow_01U9G1ZFBsCcJgkv

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 9930056 in 1 minute and 0 seconds. Click for details.

Reviewed 178 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1.

packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_with_content.yaml:40

Draft comment:
Consider using a YAML block literal (using |) for the multi‐line 'string' content to improve readability.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

2.

packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_with_content.yaml:61

Draft comment:
Review the whitespace and newlines in the 'delta' event values to ensure they are intentional and correctly concatenated.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

Workflow ID: wflow_Amji3SfqnYzUjX5d

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed a17bdbb in 41 seconds. Click for details.

Reviewed 44 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:194

Draft comment:
Removed inline comments in the token usage block. Ensure context for fallback attributes remains clear.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:586

Draft comment:
Removed redundant inline comments in the async streaming block; code clarity is maintained.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_NgggD683fVnefbcc

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 05d8fab in 2 minutes and 13 seconds. Click for details.

Reviewed 320 lines of code in 4 files
Skipped 0 files when reviewing.
Skipped posting 4 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:221

Draft comment:
Consider using the SpanAttributes constants directly rather than wrapping them in an f-string (e.g. use SpanAttributes.LLM_REQUEST_REASONING_SUMMARY instead of f"{SpanAttributes.LLM_REQUEST_REASONING_SUMMARY}"). This avoids unnecessary string conversion.
Reason this comment was not posted:
Comment was on unchanged code.

2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:230

Draft comment:
Review the defaulting of reasoning attributes: using 'traced_response.request_reasoning_summary or ()' may result in a tuple when no value is present. Since downstream tests expect a JSON–serialized string, consider consistently serializing these attributes (e.g. via json.dumps) to maintain type consistency.
Reason this comment was not posted:
Comment was on unchanged code.

3. packages/opentelemetry-instrumentation-openai/tests/conftest.py:175

Draft comment:
The new 'instrument_legacy_with_content' fixture cleanly sets the TRACELOOP_TRACE_CONTENT environment variable. This is clear and follows best practices for test isolation.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply praises the code for following best practices, which violates the rule against making purely informative comments.

4. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:277

Draft comment:
The new test 'test_responses_streaming_with_content' effectively verifies content tracing by asserting that both prompt and completion attributes are captured. For future enhancement, consider testing edge cases like empty streaming deltas or handling of large output segments (e.g. using list join for string concatenation) to optimize performance, though current usage appears acceptable.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This comment has two parts: 1) suggesting additional edge case tests (empty deltas, large segments), and 2) suggesting a performance optimization (using list join instead of +=). The first part is asking for more test coverage, which is speculative and not pointing to a clear bug. The second part is a performance optimization suggestion, but this is a test file, not production code, so performance is not critical. The string concatenation pattern used here is standard in tests and perfectly acceptable. The comment doesn't identify any actual bug or issue with the current implementation - it's just suggesting "nice to have" improvements. According to the rules, comments should not be speculative ("Consider testing edge cases...") and should clearly identify a code change that's required. Could the performance optimization be important even in tests? Could there be a real issue with how empty deltas are handled that would cause the test to fail? Maybe the comment is trying to prevent a real bug. Even if performance could theoretically matter, this is a simple test with a short response ("What is 2+2?"), so the performance concern is not relevant here. The code already handles empty deltas correctly with the if item.delta: check on line 293. The comment is purely speculative about potential improvements, not identifying an actual issue. This comment should be deleted. It's making speculative suggestions about edge cases and performance optimizations rather than identifying a clear bug or required change. The code works correctly as written, and the suggestions are "nice to have" rather than necessary fixes.

Workflow ID: wflow_V6Mw3lCdmrdtELTD

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 810f34e in 2 minutes and 6 seconds. Click for details.

Reviewed 124 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 6 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:817

Draft comment:
Consider returning the result of wrapped.enter if it differs from self. In some context managers, enter returns an object different from self; if that behavior is desired, adjust accordingly.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:822

Draft comment:
Good use of a try-finally block in exit to ensure the underlying exit method is always called. This helps propagate suppression decisions from the wrapped object.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, as it praises the use of a try-finally block without suggesting any changes or improvements. It doesn't align with the rules for useful comments in a pull request review.

3. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:835

Draft comment:
Nice addition of async context manager support. The async aenter correctly awaits the underlying aenter (if present).
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, as it only praises the addition of async context manager support without providing any actionable feedback or suggestions. It does not align with the rules for useful comments.

4. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:841

Draft comment:
Async aexit properly awaits the underlying aexit and returns its suppression status. This ensures consistent async cleanup.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply states that something is done correctly without offering any guidance or questions.

5. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:312

Draft comment:
The test 'test_responses_streaming_with_context_manager' correctly uses the context manager to consume the streaming response and verifies telemetry spans. Consider adding tests to cover exception scenarios within the context.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This comment is suggesting that the PR author add more tests to cover exception scenarios. This is not pointing out a bug or issue with the code that was added - it's suggesting additional test coverage. According to the rules, comments should only be kept if there is "clearly a code change required" and we should "Do NOT comment unless there is clearly a code change required." This is a suggestion for enhancement/additional coverage, not a required fix. The test as written appears to be valid and functional. The comment is essentially asking the author to do more work beyond what they've already done, which doesn't meet the bar for keeping the comment. Could this be pointing out that the context manager pattern specifically requires exception testing to ensure proper cleanup? Perhaps there's a real concern about resource leaks if exceptions occur within the context manager block, which would be a legitimate issue to address. While exception handling in context managers is important, the comment doesn't identify a specific bug or missing functionality in the code that was added. It's a general suggestion for additional test coverage. If there was a specific exception handling bug in the implementation being tested, that would be different, but this is just asking for more comprehensive testing without evidence of a problem. The existing test validates the happy path, which is appropriate for a new feature test. This comment should be deleted. It's suggesting additional test coverage rather than identifying a specific issue with the code changes. The comment doesn't point to a bug or required fix - it's asking the author to add more tests, which violates the rule that comments should only be made when there is clearly a code change required.

6. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:340

Draft comment:
The async context manager test 'test_responses_streaming_async_with_context_manager' is well implemented and confirms that telemetry spans are emitted. Testing error paths in async scenarios might further improve coverage.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This comment is suggesting that the test could be improved by adding error path testing. However, this is not pointing out a bug or issue with the code that was added. It's a suggestion for additional test coverage that could be added in the future. According to the rules, comments should only be kept if they clearly identify a code change that is required. This is more of a "nice to have" suggestion rather than identifying a problem with the current implementation. The test as written appears to be valid and tests the happy path for async context manager usage. Suggesting additional test coverage is not the same as identifying a bug or required change. Could this be pointing out that the test is incomplete or insufficient? Perhaps error handling is critical for async context managers and the test is inadequate without it. However, I don't see evidence that the existing test is wrong or that error testing is mandatory for this PR. While error testing might be valuable, the comment doesn't identify any specific issue with the code that was added. It's a general suggestion for improvement rather than identifying a required change. The test appears to be complete for its stated purpose of testing the happy path of async streaming with context managers. This comment should be deleted. It's a suggestion for additional test coverage rather than identifying a required code change. The rules explicitly state not to make comments unless there is clearly a code change required, and this is more of an optional enhancement suggestion.

Workflow ID: wflow_9nYABQnTpq4epFUs

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 3a594b7 in 2 minutes and 18 seconds. Click for details.

Reviewed 416 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1.

packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_async_with_context_manager.yaml:21

Draft comment:
The x-stainless-async header here is set to '- async:asyncio', which is non-standard compared to the typical boolean string seen in the sync cassette. Verify that the instrumentation correctly interprets this array value for async behavior.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50% None

2.

packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_async_with_context_manager.yaml:160

Draft comment:
The cassette includes several dynamic HTTP headers (e.g., CF-RAY and Set-Cookie). Ensure these values are normalized or sanitized in tests to avoid potential flakiness.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment is asking the author to "ensure" something is done, which falls into the category of verification requests that the rules say to avoid. The comment doesn't point to a specific bug or issue - it's speculative about "potential flakiness". Cassette files are typically recorded fixtures that capture real HTTP responses, and the dynamic headers are part of the actual response. The comment doesn't provide evidence that these headers are actually causing problems, nor does it suggest a specific code change. It's more of a general best practice suggestion rather than identifying a concrete issue with the change. According to the rules, speculative comments like "potential flakiness" should be removed unless there's definite evidence of an issue. Could this actually be a legitimate concern? If these cassette files are used for exact matching in tests, dynamic values could indeed cause issues. However, I don't see evidence in the diff that this is causing a problem, and cassette libraries typically handle this automatically or the test framework would have configuration for it elsewhere. While the concern about dynamic headers could be valid in theory, the comment is speculative ("potential flakiness") and asks the author to "ensure" something rather than pointing to a concrete issue. The rules explicitly state not to make speculative comments or ask authors to verify/ensure things. If this were a real problem, it would likely be caught during test execution. This comment should be deleted. It's speculative about "potential flakiness" without evidence of an actual problem, and it asks the author to "ensure" something is done, which violates the rule against verification requests. Cassette files are typically generated fixtures, and if dynamic headers were an issue, it would be caught by the test framework.

3.

packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_with_context_manager.yaml:137

Draft comment:
The aggregated output text (in response.output_text.done) contains newline characters and multiline formatting. Ensure that the streaming telemetry correctly concatenates these delta chunks as expected.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50% None

Workflow ID: wflow_iIKBL7njLTDBPP4b

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (2)

228-246: Simplify unnecessary f-string formatting.

The f-string formatting on lines 232, 238, and 244 is redundant since SpanAttributes.LLM_REQUEST_REASONING_SUMMARY and similar attributes are already strings. You can pass them directly to _set_span_attribute.

Apply this diff:

-            f"{SpanAttributes.LLM_REQUEST_REASONING_SUMMARY}",
+            SpanAttributes.LLM_REQUEST_REASONING_SUMMARY,

-            f"{SpanAttributes.LLM_REQUEST_REASONING_EFFORT}",
+            SpanAttributes.LLM_REQUEST_REASONING_EFFORT,

-            f"{SpanAttributes.LLM_RESPONSE_REASONING_EFFORT}",
+            SpanAttributes.LLM_RESPONSE_REASONING_EFFORT,

890-905: Consider list accumulation for better performance.

Repeated string concatenation on lines 895 and 901 creates new string objects for each chunk. For many small chunks, accumulating into a list and joining once would be more efficient.

Apply this diff:

+        # Initialize list for text accumulation in __init__
+        self._output_text_chunks = []
+
     def _process_chunk(self, chunk):
         """Process a streaming chunk"""
         if hasattr(chunk, "type"):
             if chunk.type == "response.output_text.delta":
                 if hasattr(chunk, "delta") and chunk.delta:
-                    self._output_text += chunk.delta
+                    self._output_text_chunks.append(chunk.delta)
             elif chunk.type == "response.completed" and hasattr(chunk, "response"):
                 self._complete_response_data = chunk.response
 
         if hasattr(chunk, "delta"):
             if hasattr(chunk.delta, "text") and chunk.delta.text:
-                self._output_text += chunk.delta.text
+                self._output_text_chunks.append(chunk.delta.text)
 
         if hasattr(chunk, "response") and chunk.response:
             self._complete_response_data = chunk.response

Then in _process_complete_response (line 919), replace:

-                    self._traced_data.output_text = self._output_text
+                    self._traced_data.output_text = ''.join(self._output_text_chunks)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 05d8fab and 3a594b7.

📒 Files selected for processing (4)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (6 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_async_with_context_manager.yaml (1 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_with_context_manager.yaml (1 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py (7 hunks)

✅ Files skipped from review due to trivial changes (2)

packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_async_with_context_manager.yaml
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_with_context_manager.yaml

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py

🧬 Code graph analysis (2)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (3)

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)

SpanAttributes (64-245)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (2)

_ensure_cleanup (786-821)

_process_complete_response (740-783)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (2)

_handle_exception (151-158)

dont_throw (132-160)

packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py (3)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (1)

is_reasoning_supported (25-30)

packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (3)

export (45-51)

InMemorySpanExporter (22-61)

get_finished_spans (40-43)

packages/opentelemetry-instrumentation-openai/tests/conftest.py (4)

instrument_legacy (136-151)

openai_client (42-43)

async_openai_client (82-83)

instrument_legacy_with_content (175-181)

🪛 Ruff (0.14.3)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

936-936: Do not catch blind exception: Exception

(BLE001)

973-973: Do not catch blind exception: Exception

(BLE001)

packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py

10-10: Unused function argument: instrument_legacy

(ARG001)

31-31: Unused function argument: instrument_legacy

(ARG001)

86-86: Unused function argument: instrument_legacy

(ARG001)

158-158: Unused function argument: instrument_legacy

(ARG001)

186-186: Unused function argument: instrument_legacy

(ARG001)

215-215: Unused function argument: instrument_legacy

(ARG001)

249-249: Unused function argument: instrument_legacy

(ARG001)

279-279: Unused function argument: instrument_legacy_with_content

(ARG001)

314-314: Unused function argument: instrument_legacy

(ARG001)

342-342: Unused function argument: instrument_legacy

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Test Packages (3.10)
GitHub Check: Test Packages (3.11)
GitHub Check: Test Packages (3.12)
GitHub Check: Build Packages (3.11)
GitHub Check: Lint

🔇 Additional comments (11)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (6)

4-4: LGTM!

The threading and wrapt.ObjectProxy imports are appropriate for implementing thread-safe cleanup and transparent proxying of streaming responses.

Also applies to: 8-8

195-206: LGTM!

The conditional attribute setting provides proper backward compatibility by checking for the newer GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS attribute first before falling back to the legacy GEN_AI_USAGE_INPUT_TOKENS_CACHED.

444-457: LGTM!

The wrappers correctly start a span and wrap streaming responses in ResponseStream for both synchronous and asynchronous cases, delegating span lifecycle management to the proxy.

Also applies to: 586-599

764-809: LGTM!

The ResponseStream class properly extends ObjectProxy to transparently proxy streaming responses while capturing telemetry. The initialization correctly sets up state including thread-safe cleanup tracking.

811-888: LGTM!

The context manager and iterator implementations properly delegate to the wrapped stream while ensuring telemetry cleanup runs. Both synchronous and asynchronous variants correctly handle normal completion, early exit, and error scenarios. The previous concern about context-manager semantics has been addressed.

906-974: LGTM!

The cleanup methods implement proper thread-safe, idempotent cleanup with appropriate guards. The broad exception handling on lines 936 and 973 is intentional and correct for cleanup paths, ensuring spans are always finalized even when errors occur during telemetry processing.

packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py (5)

3-3: LGTM!

The AsyncOpenAI import is necessary for the new async streaming tests added in this PR.

152-211: LGTM!

The reasoning tests properly use conditional skipping and validate that complex reasoning data is serialized as JSON strings, addressing issue #3350 about invalid type warnings.

213-244: LGTM!

The sync streaming test correctly validates that streaming responses emit telemetry spans with proper attributes and that content is successfully streamed and accumulated.

246-275: LGTM!

The async streaming tests correctly use async/await patterns and validate that both regular async streaming and async context-manager usage properly emit telemetry spans.

Also applies to: 339-366

277-337: LGTM!

The content tracing and context manager tests validate important scenarios: that prompt/completion content is properly captured during streaming when content tracing is enabled, and that context manager usage (both sync and async) correctly finalizes spans.

nirga · 2025-11-10T16:45:51Z

...lemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

-                GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS,
-                usage.input_tokens_details.cached_tokens,
-            )
+            if hasattr(GenAIAttributes, 'GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS'):


why do you need it?

nirga · 2025-11-10T16:45:58Z

...lemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

            )

+        # Only set reasoning tokens if the attribute exists
+        if hasattr(SpanAttributes, 'LLM_USAGE_REASONING_TOKENS'):


nirga · 2025-11-10T16:46:08Z

...lemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

                else getattr(tokens_details, "reasoning_tokens", None)
            )

+        # Only set reasoning tokens if the attribute exists


redundant comment

nirga · 2025-11-10T16:46:17Z

...lemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

+            f"{SpanAttributes.LLM_REQUEST_REASONING_SUMMARY}",
+            traced_response.request_reasoning_summary or (),
+        )
+    if hasattr(SpanAttributes, 'LLM_REQUEST_REASONING_EFFORT'):


nirga · 2025-11-10T16:47:54Z

packages/opentelemetry-instrumentation-openai/tests/conftest.py

    instrumentor.uninstrument()


+@pytest.fixture(scope="function")


I don't think you need this

packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 974ae91 in 1 minute and 15 seconds. Click for details.

Reviewed 227 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 4 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py:232

Draft comment:
The attribute change from GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS to SpanAttributes.LLM_USAGE_CACHE_READ_INPUT_TOKENS looks intentional. Please double‐check that this naming is aligned with the rest of the telemetry attributes.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:392

Draft comment:
In the sync wrapper, streaming responses are checked with isinstance(response, Stream). Confirm that this check fully covers the intended types. (AsyncStream is handled only in the async wrapper.)
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

3. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:785

Draft comment:
In ResponseStream._process_chunk, the code checks for a delta in two separate blocks (one under a type check and another unconditional block). This may lead to duplication if chunk.delta is both truthy and has a 'text' attribute. Consider consolidating the logic to avoid appending the same text twice.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

4. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:800

Draft comment:
The cleanup logic in ResponseStream (methods _ensure_cleanup and _handle_exception) uses a threading.Lock and _cleanup_completed flag to prevent double cleanup. It would be helpful to add inline comments clarifying the idempotency of cleanup operations for future maintainability.
Reason this comment was not posted:
Confidence changes required: 30% <= threshold 50% None

Workflow ID: wflow_lmRWaqU2unaicQaz

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed df7ca6e in 2 minutes and 18 seconds. Click for details.

Reviewed 48 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py:289

Draft comment:
Adding reasoning request attributes here. Consider verifying that using an empty tuple as a default (via reasoning_effort or ()) is the intended behavior and type—should it be a string/None instead?
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This comment is explicitly asking the author to "verify that... is the intended behavior" which directly violates the rule: "Do NOT ask the PR author to confirm their intention, to explain, to double-check things, to ensure the behavior is intended". The comment doesn't state that there IS a problem - it's speculative and asks for verification. Without seeing the definition of _set_span_attribute or SpanAttributes.LLM_REQUEST_REASONING_EFFORT, I cannot determine if using an empty tuple is actually wrong. The comment doesn't provide strong evidence that this is incorrect - it's just questioning the choice. Maybe the empty tuple is actually incorrect and would cause a runtime error or type mismatch in _set_span_attribute. Perhaps the reviewer has knowledge about what type this attribute should be, and an empty tuple would be problematic. Even if the empty tuple might be wrong, the comment doesn't state definitively that it IS wrong - it asks the author to verify. Without seeing clear evidence that this will cause an error or is definitively incorrect, and given that the comment is phrased as a verification request, it should be deleted per the rules. This comment should be deleted because it asks the author to "verify" their intention rather than stating a clear code issue. It's speculative ("should it be...?") rather than definitive, and there's no strong evidence provided that the empty tuple is incorrect.

2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py:327

Draft comment:
In the _handle_response function, reasoning usage attributes are extracted from the response. Ensure that defaulting to 0 using (reasoning_tokens or 0) aligns with expected numeric types and that the extraction logic handles cases where 'completion_tokens_details' might not include a 'reasoning_tokens' field.
Reason this comment was not posted:
Comment looked like it was already resolved.

Workflow ID: wflow_uFQqg8kl8bsTVEtP

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Caution

Changes requested ❌

Reviewed b576554 in 2 minutes and 46 seconds. Click for details.

Reviewed 150 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 4 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:127

Draft comment:
Added new reasoning attributes to TracedData. Ensure these fields are documented and that their expected type (string) is maintained downstream.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This comment is asking the PR author to "ensure" documentation exists and type consistency is maintained. This is exactly the kind of comment the rules say to avoid - it starts with "Ensure that..." and doesn't point to a specific code issue. The comment doesn't identify any actual problem with the code - the types are already declared as Optional[str] in the pydantic model, and the code appears to handle them consistently. The comment is more of a reminder or suggestion to verify things rather than pointing out a concrete issue. According to the rules, comments that ask the author to "ensure" things or verify/confirm are not useful and should be deleted. Could there be a legitimate concern about documentation or type safety that I'm missing? Perhaps the comment is trying to highlight that these new fields need docstrings or that there's a type inconsistency somewhere in how they're used downstream? Even if documentation would be nice to have, the comment doesn't point to a specific missing docstring or documentation requirement. The types are clearly declared in the pydantic model, and I can see from the code that they're used consistently as strings throughout. The comment is asking the author to "ensure" things rather than identifying a concrete problem, which violates the rules. This comment should be deleted. It asks the PR author to "ensure" documentation and type consistency without identifying any specific code issue. This is exactly the type of non-actionable, verification-requesting comment that the rules say to avoid.

2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:476

Draft comment:
Both request_reasoning_effort and response_reasoning_effort are sourced from the same 'effort' key in kwargs. Verify that this is intended, as they represent distinct attributes.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment is asking the PR author to "verify" something, which violates the rule: "Do NOT ask the PR author to confirm their intention, to explain, to double-check things, to ensure the behavior is intended, to make sure their change is tested, or similar." The comment doesn't definitively state there's a bug - it's speculative. Looking at the code context, this is in an exception handler where kwargs contains the request parameters. It's plausible that in this error path, the response reasoning effort hasn't been set yet, so it falls back to the request's effort value. Without seeing the OpenAI API documentation or understanding the full context of when this code path is hit, I cannot definitively say this is wrong. The comment is asking for verification rather than stating a clear issue. However, it does seem suspicious that both request and response attributes would use the same source value. This could be a legitimate bug where response_reasoning_effort should come from a different source (like existing_data or the actual response object). The pattern is repeated consistently across multiple locations, which could indicate either intentional design or a systematic error. While it does seem suspicious, the comment explicitly asks to "verify that this is intended" rather than stating "this is incorrect." This is a verification request, not a definitive bug report. The rules clearly state not to ask for confirmation. Additionally, this is in an exception handler path (line 450 shows except Exception as e:), so the behavior might be intentional - when an error occurs, there may not be a response yet, so using the request's effort value as a fallback could be correct. Without stronger evidence that this is definitely wrong, I should delete this comment. This comment should be deleted because it asks the PR author to "verify" the intention rather than identifying a definite issue. It's a speculative comment that violates the rule against asking for confirmation.

3. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:614

Draft comment:
In the async wrapper, reasoning attributes are handled similarly. Ensure consistency with the sync version and consider similar fallback logic if needed.
Reason this comment was not posted:
Comment looked like it was already resolved.

4. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:790

Draft comment:
The ResponseStream init assigns reasoning attributes from _request_kwargs. Confirm that these values (or their defaults) align with expectations across all code paths.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to confirm that the values align with expectations across all code paths, which is a form of asking for confirmation or verification. This violates the rule against asking the PR author to confirm their intention or ensure behavior is intended.

Workflow ID: wflow_PmONKpNByCMrnk5F

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev · 2025-11-11T08:54:40Z

...lemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

-            SpanAttributes.LLM_USAGE_REASONING_TOKENS,
-            reasoning_tokens or 0,
+            f"{SpanAttributes.LLM_REQUEST_REASONING_SUMMARY}",
+            traced_response.request_reasoning_summary or (),


Fallback values for reasoning attributes are set using or (). Consider using an empty string ('') to match the Optional[str] type.

ellipsis-dev

Important

Looks good to me! 👍

Reviewed e5c8eb8 in 1 minute and 6 seconds. Click for details.

Reviewed 56 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:211

Draft comment:
Removed conditional checks for telemetry reasoning attributes. Please verify that all SpanAttributes (e.g. LLM_USAGE_REASONING_TOKENS) are guaranteed to exist in supported versions so that no AttributeError is raised.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to verify that certain attributes are guaranteed to exist, which falls under asking the author to ensure behavior is intended or to double-check things. This violates the rules.

2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:221

Draft comment:
The use of f-string formatting for attribute keys (e.g. f"{SpanAttributes.LLM_REQUEST_REASONING_SUMMARY}") may be unnecessary if the constants are already strings. Consider using the constant directly to simplify the code.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

Workflow ID: wflow_pu8eNVvZdbFErkSG

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed c1cfeb9 in 1 minute and 0 seconds. Click for details.

Reviewed 21 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai/tests/conftest.py:155

Draft comment:
Removal of the instrument_legacy_with_content fixture looks intentional to consolidate legacy configurations into instrument_with_content. Ensure no tests depend on it and that legacy content tracing remains covered by existing fixtures.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_6jWYRKeuwh7Ki3Q4

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 1b19abd in 47 seconds. Click for details.

Reviewed 37 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:210

Draft comment:
Minor formatting improvement: splitting the isinstance assert into multiple lines enhances readability.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

2. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:282

Draft comment:
Removal of the unused 'instrument_legacy_with_content' fixture improves parameter consistency. Make sure this change doesn't affect tests expecting content tracing instrumentation.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

3. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:344

Draft comment:
Reformatting the fixture parameters for async context-manager test enhances readability. The functional behavior remains intact.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_01oRhqY1wRnciOCa

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 974ae91 and b576554.

📒 Files selected for processing (1)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (6 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

🧬 Code graph analysis (1)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (4)

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)

SpanAttributes (64-245)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)

_set_span_attribute (30-37)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (2)

_ensure_cleanup (786-821)

_process_complete_response (740-783)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (2)

_handle_exception (151-158)

dont_throw (132-160)

🪛 Ruff (0.14.4)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

928-928: Do not catch blind exception: Exception

(BLE001)

965-965: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Test Packages (3.10)
GitHub Check: Lint
GitHub Check: Test Packages (3.11)
GitHub Check: Build Packages (3.11)
GitHub Check: Test Packages (3.12)

...lemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

ellipsis-dev

Important

Looks good to me! 👍

Reviewed f480d19 in 57 seconds. Click for details.

Reviewed 41 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:280

Draft comment:
Adding the 'instrument_legacy' fixture to the function signature (line 10) ensures that the instrumentation is correctly set up for streaming tests. Verify that this fixture is properly configured in the test environment.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to verify that the fixture is properly configured, which violates the rule against asking for confirmation or verification. It is not making a specific code suggestion or pointing out a specific issue that needs to be addressed.

2. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:337

Draft comment:
The synchronous context manager test now includes assertions for both prompt and completion attributes (content, role). This thoroughly verifies that telemetry captures streaming details. Ensure that these attributes are consistently set by the instrumentation.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the author to ensure consistency in setting attributes, which is a form of asking for confirmation or verification. This violates the rule against asking the author to ensure or verify things. The comment does not provide a specific suggestion or point out a specific issue, making it not useful according to the guidelines.

3. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:369

Draft comment:
The asynchronous context manager test now also asserts that prompt and completion telemetry attributes match the streaming output. This change covers the async streaming instrumentation and aligns its behavior with the sync variant.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_fIN56P6L5dkjiVkS

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 6bf70a1 in 51 seconds. Click for details.

Reviewed 27 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:253

Draft comment:
Good use of a variable (input_text) to avoid duplicating the prompt string. This improves maintainability.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

2. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:276

Draft comment:
New assertions for telemetry attributes (prompt and completion data) ensure that streaming spans capture expected values.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_YPIXw1QLtu541FIS

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 272e106 in 1 minute and 7 seconds. Click for details.

Reviewed 15 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:247

Draft comment:
Good addition! These assertions verify that streaming responses capture both prompt and completion telemetry correctly. Consider adding custom assertion messages for clarity if a test fails.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_EFD5bHnues0kkUBN

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 8d4d8dc in 1 minute and 16 seconds. Click for details.

Reviewed 25 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:838

Draft comment:
The new close() method correctly ensures cleanup by calling _ensure_cleanup() before delegating to the wrapped object's close. Consider adding a docstring to clarify its purpose and note that any exceptions from _ensure_cleanup() are intentionally suppressed.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:847

Draft comment:
The async aclose() method properly awaits the underlying object's aclose() after calling _ensure_cleanup(). A brief docstring could help future maintainers understand its role in ensuring async cleanup of telemetry spans.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_8x40e1am2wb3MOa0

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1)
219-233: Use empty string fallback for type consistency.

Lines 222, 227, and 232 use or () as fallback values for reasoning attributes. Since these fields are typed as Optional[str] in TracedData, using an empty string '' instead of an empty tuple would maintain type consistency.

Apply this diff:
     _set_span_attribute(
         span,
         f"{SpanAttributes.LLM_REQUEST_REASONING_SUMMARY}",
-        traced_response.request_reasoning_summary or (),
+        traced_response.request_reasoning_summary or '',
     )
     _set_span_attribute(
         span,
         f"{SpanAttributes.LLM_REQUEST_REASONING_EFFORT}",
-        traced_response.request_reasoning_effort or (),
+        traced_response.request_reasoning_effort or '',
     )
     _set_span_attribute(
         span,
         f"{SpanAttributes.LLM_RESPONSE_REASONING_EFFORT}",
-        traced_response.response_reasoning_effort or (),
+        traced_response.response_reasoning_effort or '',
     )

🧹 Nitpick comments (1)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1)

891-906: Consider using list accumulation for better performance.

Lines 896 and 902 perform repeated string concatenation (+=) in a loop when processing streaming chunks. For streams with many small chunks, this creates intermediate string objects and can impact performance. Consider accumulating deltas in a list and joining once at the end.

Apply this diff:

     def __init__(
         self,
         span,
         response,
         start_time=None,
         request_kwargs=None,
         tracer=None,
         traced_data=None,
     ):
         # ... existing init code ...
 
         self._complete_response_data = None
-        self._output_text = ""
+        self._output_text_parts = []
 
         self._cleanup_completed = False
         self._cleanup_lock = threading.Lock()
 
     def _process_chunk(self, chunk):
         """Process a streaming chunk"""
         if hasattr(chunk, "type"):
             if chunk.type == "response.output_text.delta":
                 if hasattr(chunk, "delta") and chunk.delta:
-                    self._output_text += chunk.delta
+                    self._output_text_parts.append(chunk.delta)
             elif chunk.type == "response.completed" and hasattr(chunk, "response"):
                 self._complete_response_data = chunk.response
 
         if hasattr(chunk, "delta"):
             if hasattr(chunk.delta, "text") and chunk.delta.text:
-                self._output_text += chunk.delta.text
+                self._output_text_parts.append(chunk.delta.text)
 
         if hasattr(chunk, "response") and chunk.response:
             self._complete_response_data = chunk.response
 
     @dont_throw
     def _process_complete_response(self):
         """Process the complete response and emit span"""
         with self._cleanup_lock:
             if self._cleanup_completed:
                 return
 
             try:
                 if self._complete_response_data:
                     parsed_response = parse_response(self._complete_response_data)
 
                     self._traced_data.response_id = parsed_response.id
                     self._traced_data.response_model = parsed_response.model
-                    self._traced_data.output_text = self._output_text
+                    self._traced_data.output_text = ''.join(self._output_text_parts)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 272e106 and 8d4d8dc.

📒 Files selected for processing (1)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (5 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

🧬 Code graph analysis (1)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (3)

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)

SpanAttributes (64-245)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (2)

_ensure_cleanup (786-821)

_process_complete_response (740-783)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (2)

_handle_exception (151-158)

dont_throw (132-160)

🪛 Ruff (0.14.4)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

846-846: return inside finally blocks cause exceptions to be silenced

(B012)

853-853: return inside finally blocks cause exceptions to be silenced

(B012)

937-937: Do not catch blind exception: Exception

(BLE001)

974-974: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Test Packages (3.11)
GitHub Check: Lint
GitHub Check: Test Packages (3.12)
GitHub Check: Test Packages (3.10)
GitHub Check: Build Packages (3.11)

🔇 Additional comments (8)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (8)

4-4: LGTM: Essential imports for streaming support.

The threading and wrapt.ObjectProxy imports are necessary for the ResponseStream implementation—threading provides locks for thread-safe cleanup, and ObjectProxy enables transparent stream wrapping.

Also applies to: 8-8

197-197: LGTM: Correct attribute naming convention.

The change from GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS to SpanAttributes.LLM_USAGE_CACHE_READ_INPUT_TOKENS aligns with the backward-compatible naming convention in the semantic conventions module.

431-444: LGTM: Streaming telemetry correctly implemented.

The sync streaming path now starts a span before returning and wraps the Stream in ResponseStream, ensuring telemetry is captured throughout the streaming lifecycle. This correctly addresses the missing spans issue for streaming responses.

573-586: LGTM: Async streaming telemetry correctly implemented.

The async streaming path mirrors the sync implementation, correctly handling both Stream and AsyncStream types and ensuring spans are started before returning the wrapped stream.

751-801: LGTM: Robust initialization and lifecycle management.

The ResponseStream class initialization properly wraps the underlying stream, captures request context into TracedData, and establishes thread-safe cleanup tracking. The __del__ method provides a safety net for cleanup if the stream is garbage-collected without being fully consumed.

855-889: LGTM: Iterator methods correctly handle streaming lifecycle.

The sync and async iterator implementations properly intercept iteration, process chunks, handle completion via StopIteration/StopAsyncIteration, and handle errors through the exception handler. The symmetric design ensures consistent behavior across sync and async paths.

908-942: LGTM: Robust completion processing with proper telemetry.

The _process_complete_response method correctly handles thread-safe, idempotent completion processing. It parses the final response, updates TracedData with accumulated streaming content, stores it for potential reuse, and properly finalizes the span with all attributes. The broad exception handling is appropriate given the @dont_throw decorator ensures telemetry failures don't break user code.

945-975: LGTM: Comprehensive exception handling and cleanup fallback.

The _handle_exception and _ensure_cleanup methods provide robust error handling and cleanup guarantees. Both are thread-safe, idempotent, and decorated with @dont_throw to ensure telemetry failures never disrupt the user's streaming workflow. The _ensure_cleanup method serves as a fallback for edge cases where normal completion doesn't occur (e.g., early close, garbage collection).

coderabbitai · 2025-11-11T09:25:56Z

...lemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

+    def close(self):
+        try:
+            self._ensure_cleanup()
+        finally:
+            if hasattr(self.__wrapped__, "close"):
+                return self.__wrapped__.close()
+
+    async def aclose(self):
+        try:
+            self._ensure_cleanup()
+        finally:
+            if hasattr(self.__wrapped__, "aclose"):
+                return await self.__wrapped__.aclose()


⚠️ Potential issue | 🟡 Minor

Fix return statements in finally blocks.

Lines 846 and 853 use return statements inside finally blocks, which silences any exceptions raised by _ensure_cleanup(). While _ensure_cleanup is decorated with @dont_throw, this pattern is still problematic because it masks the control flow.

Apply this diff:

def close(self): + result = None try: self._ensure_cleanup() finally: if hasattr(self.__wrapped__, "close"): - return self.__wrapped__.close() + result = self.__wrapped__.close() + return result async def aclose(self): + result = None try: self._ensure_cleanup() finally: if hasattr(self.__wrapped__, "aclose"): - return await self.__wrapped__.aclose() + result = await self.__wrapped__.aclose() + return result

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def close(self):

try:

self._ensure_cleanup()

finally:

if hasattr(self.__wrapped__, "close"):

return self.__wrapped__.close()

async def aclose(self):

try:

self._ensure_cleanup()

finally:

if hasattr(self.__wrapped__, "aclose"):

return await self.__wrapped__.aclose()

def close(self):

result = None

try:

self._ensure_cleanup()

finally:

if hasattr(self.__wrapped__, "close"):

result = self.__wrapped__.close()

return result

async def aclose(self):

result = None

try:

self._ensure_cleanup()

finally:

if hasattr(self.__wrapped__, "aclose"):

result = await self.__wrapped__.aclose()

return result

🧰 Tools

🪛 Ruff (0.14.4)

846-846: return inside finally blocks cause exceptions to be silenced

(B012)

853-853: return inside finally blocks cause exceptions to be silenced

(B012)

galkleinman added 2 commits November 10, 2025 15:46

fix(openai): add streaming support for responses.create() api

7d9ee85

missing files

9ba78df

delete redundant file

4bcffb8

ellipsis-dev bot reviewed Nov 10, 2025

View reviewed changes

delete redundant comments

9e7db44

ellipsis-dev bot reviewed Nov 10, 2025

View reviewed changes

coderabbitai bot reviewed Nov 10, 2025

View reviewed changes

...lemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py Show resolved Hide resolved

fix prompt logging

31ca1c5

ellipsis-dev bot reviewed Nov 10, 2025

View reviewed changes

missing cassetes

9930056

ellipsis-dev bot reviewed Nov 10, 2025

View reviewed changes

remove comments

a17bdbb

ellipsis-dev bot reviewed Nov 10, 2025

View reviewed changes

fix test

05d8fab

ellipsis-dev bot reviewed Nov 10, 2025

View reviewed changes

galkleinman added 2 commits November 10, 2025 17:58

fix pr comment for context management

810f34e

missing files

3a594b7

ellipsis-dev bot reviewed Nov 10, 2025

View reviewed changes

coderabbitai bot reviewed Nov 10, 2025

View reviewed changes

nirga approved these changes Nov 10, 2025

View reviewed changes

remove non official semconvs

974ae91