Skip to content

Conversation

@edgarrmondragon
Copy link
Collaborator

@edgarrmondragon edgarrmondragon commented Aug 20, 2025

Summary

Resolves #1606

This PR makes the pagination object available as an instance attribute (self.paginator) across all method scopes within the REST stream class, not just within request_records. This eliminates the need to duplicate pagination configuration values across different methods.

Key Changes

  • Added pagination instance attribute - _current_paginator with proper lifecycle management
  • Implemented context manager pattern - Clean creation and cleanup of paginator per request cycle
  • Enhanced method documentation - Updated get_url_params with usage examples
  • Comprehensive test coverage - 4 new test cases covering all aspects of the feature

Benefits

  • Reduces code duplication - Access pagination config from any method during request processing
  • Maintains backward compatibility - All existing implementations continue to work unchanged
  • Fresh paginator per context - Each request_records call gets a new paginator instance
  • Improved developer experience - Clear error messages when accessing paginator incorrectly

Usage Example

Before (with duplication):

class MyStream(RESTStream):
    PAGE_SIZE = 25

    def get_new_paginator(self):
        return BaseOffsetPaginator(start_value=0, page_size=self.PAGE_SIZE)

    def get_url_params(self, context, next_page_token):
        return {"limit": self.PAGE_SIZE}  # Duplicated constant

After (no duplication):

class MyStream(RESTStream):
    def get_new_paginator(self):
        return BaseOffsetPaginator(start_value=0, page_size=25)

    def get_url_params(self, context, next_page_token):
        # Access paginator configuration directly
        try:
            limit = self.paginator.page_size  # If accessible
        except AttributeError:
            limit = 25  # fallback
        return {"limit": limit}

Testing

  • All existing tests continue to pass
  • New comprehensive test suite validates:
    • Paginator accessibility across multiple methods
    • Fresh paginator creation per request
    • Proper lifecycle cleanup
    • Backward compatibility
    • Error handling when accessed incorrectly

🤖 Generated with Claude Code

Summary by Sourcery

Make the pagination object available across all REST stream methods by introducing a context-managed paginator property while preserving existing behaviors

New Features:

  • Expose self.paginator property during request_records lifecycle
  • Introduce _paginator_context context manager to handle paginator creation and cleanup

Enhancements:

  • Refactor request_records to use the paginator context manager
  • Update get_url_params documentation to reference the new paginator property

Documentation:

  • Enhance RESTStream docstrings to describe self.paginator availability and usage

Tests:

  • Add tests for paginator accessibility in multiple methods
  • Add tests for fresh paginator creation per request_records call
  • Add tests for paginator lifecycle cleanup
  • Add tests for fallback when get_new_paginator returns None
  • Add tests for backward compatibility with legacy pagination patterns

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Aug 20, 2025

Reviewer's Guide

This PR refactors the RESTStream pagination implementation by introducing a private _current_paginator attribute and a _paginator_context manager to expose a fresh paginator instance (self.paginator) across all method scopes, replaces the inline paginator logic in request_records with this context manager, enhances documentation for pagination, and adds comprehensive tests covering paginator accessibility, lifecycle, freshness, fallback behavior, and backward compatibility.

Sequence diagram for paginator lifecycle in request_records

sequenceDiagram
    participant RESTStream
    participant Paginator
    participant Metrics
    participant HTTPRequest
    RESTStream->>RESTStream: enter _paginator_context()
    RESTStream->>Paginator: get_new_paginator()
    RESTStream->>RESTStream: set self._current_paginator
    RESTStream->>Metrics: start http_request_counter
    loop while not paginator.finished
        RESTStream->>HTTPRequest: prepare_request(context, next_page_token)
        RESTStream->>HTTPRequest: decorated_request(prepared_request, context)
        RESTStream->>Metrics: increment counter
        RESTStream->>RESTStream: update_sync_costs
        RESTStream->>RESTStream: parse_response(resp)
        RESTStream->>Paginator: advance(resp)
    end
    RESTStream->>RESTStream: exit _paginator_context(), cleanup self._current_paginator
Loading

Class diagram for updated RESTStream pagination lifecycle

classDiagram
    class RESTStream {
        - _current_paginator: BaseAPIPaginator | None
        + paginator: BaseAPIPaginator
        + _paginator_context(): Iterator[BaseAPIPaginator]
        + request_records(context: Context | None): Iterable[dict]
        + get_new_paginator()
        + get_url_params(context, next_page_token)
    }
    RESTStream --> BaseAPIPaginator
    RESTStream --> SinglePagePaginator
    RESTStream : uses _paginator_context for paginator lifecycle
    RESTStream : exposes paginator property during request_records
    RESTStream : get_url_params can access self.paginator
    class BaseAPIPaginator {
        + current_value
        + finished
        + advance(resp)
        + continue_if_empty(resp)
        + page_size
    }
    class SinglePagePaginator {
        + finished
    }
    BaseAPIPaginator <|-- SinglePagePaginator
Loading

Class diagram for paginator property and context manager

classDiagram
    class RESTStream {
        + paginator: BaseAPIPaginator
        + _paginator_context(): Iterator[BaseAPIPaginator]
    }
    RESTStream --> BaseAPIPaginator
    RESTStream : paginator only available during request_records
    RESTStream : _paginator_context manages lifecycle
Loading

File-Level Changes

Change Details Files
Introduce paginator attribute and lifecycle context manager
  • Initialize _current_paginator in RESTStream constructor
  • Add paginator property with runtime check for active context
  • Implement _paginator_context context manager for setup/cleanup
  • Wrap request_records logic in _paginator_context and move paginator advancement
singer_sdk/streams/rest.py
Update pagination-related documentation
  • Expand paginator property docstring to describe usage and errors
  • Enhance get_url_params doc with paginator availability note
singer_sdk/streams/rest.py
Add comprehensive pagination tests
  • Test paginator access within various methods and error outside context
  • Verify fresh paginator creation per request_records call
  • Ensure paginator cleanup after processing
  • Check fallback to SinglePagePaginator when none provided
  • Validate backward compatibility of legacy vs. modern usage
tests/core/rest/test_pagination.py

Assessment against linked issues

Issue Objective Addressed Explanation
#1606 Make the pagination object available as an instance attribute (e.g., self.paginator) in more scopes of the REST stream class, not just within request_records.
#1606 Ensure that the paginator instance is accessible from other methods such as get_url_params, allowing access to pagination configuration without duplicating logic.
#1606 Maintain proper lifecycle management so that the paginator is only available during active stream processing and is cleaned up after use.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Add a test for early generator close to ensure _current_paginator always resets even when iteration stops prematurely
  • Consider isolating paginator state using thread-local or stack-based storage to support concurrent or nested request_records calls without state collisions
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Add a test for early generator close to ensure `_current_paginator` always resets even when iteration stops prematurely
- Consider isolating paginator state using thread-local or stack-based storage to support concurrent or nested `request_records` calls without state collisions

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +578 to +583
if offset == 0:
r._content = json.dumps({"data": [{"id": 1}, {"id": 2}]}).encode()
elif offset == 2:
r._content = json.dumps({"data": [{"id": 3}, {"id": 4}]}).encode()
else:
r._content = json.dumps({"data": []}).encode()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Avoid conditionals in tests. (no-conditionals-in-tests)

ExplanationAvoid complex code, like conditionals, in test functions.

Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:

  • loops
  • conditionals

Some ways to fix this:

  • Use parametrized tests to get rid of the loop.
  • Move the complex logic into helpers.
  • Move the complex part into pytest fixtures.

Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.

Software Engineering at Google / Don't Put Logic in Tests

Comment on lines +660 to +664
if offset == 0:
r._content = json.dumps({"data": [{"id": 1}]}).encode()
else:
# Return empty data to end pagination
r._content = json.dumps({"data": []}).encode()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Avoid conditionals in tests. (no-conditionals-in-tests)

ExplanationAvoid complex code, like conditionals, in test functions.

Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:

  • loops
  • conditionals

Some ways to fix this:

  • Use parametrized tests to get rid of the loop.
  • Move the complex logic into helpers.
  • Move the complex part into pytest fixtures.

Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.

Software Engineering at Google / Don't Put Logic in Tests

Comment on lines +712 to +716
if offset == 0:
r._content = json.dumps({"data": [{"id": 1}]}).encode()
else:
# Return empty data to end pagination
r._content = json.dumps({"data": []}).encode()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Avoid conditionals in tests. (no-conditionals-in-tests)

ExplanationAvoid complex code, like conditionals, in test functions.

Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:

  • loops
  • conditionals

Some ways to fix this:

  • Use parametrized tests to get rid of the loop.
  • Move the complex logic into helpers.
  • Move the complex part into pytest fixtures.

Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.

Software Engineering at Google / Don't Put Logic in Tests

Comment on lines +804 to +807
if skip == 0:
r._content = json.dumps({"data": [{"id": 1}, {"id": 2}]}).encode()
else:
r._content = json.dumps({"data": []}).encode()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Avoid conditionals in tests. (no-conditionals-in-tests)

ExplanationAvoid complex code, like conditionals, in test functions.

Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:

  • loops
  • conditionals

Some ways to fix this:

  • Use parametrized tests to get rid of the loop.
  • Move the complex logic into helpers.
  • Move the complex part into pytest fixtures.

Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.

Software Engineering at Google / Don't Put Logic in Tests

Comment on lines +843 to +846
if skip == 0:
r._content = json.dumps({"data": [{"id": 1}, {"id": 2}]}).encode()
else:
r._content = json.dumps({"data": []}).encode()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Avoid conditionals in tests. (no-conditionals-in-tests)

ExplanationAvoid complex code, like conditionals, in test functions.

Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:

  • loops
  • conditionals

Some ways to fix this:

  • Use parametrized tests to get rid of the loop.
  • Move the complex logic into helpers.
  • Move the complex part into pytest fixtures.

Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.

Software Engineering at Google / Don't Put Logic in Tests

}

def _request(self, prepared_request, context): # noqa: ARG002
r = Response()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Extract duplicate code into method (extract-duplicate-method)

@codecov
Copy link

codecov bot commented Aug 20, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.89%. Comparing base (f9b8921) to head (1f8c098).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3245      +/-   ##
==========================================
+ Coverage   92.87%   92.89%   +0.01%     
==========================================
  Files          63       63              
  Lines        5458     5472      +14     
  Branches      682      683       +1     
==========================================
+ Hits         5069     5083      +14     
  Misses        282      282              
  Partials      107      107              
Flag Coverage Δ
core 79.18% <100.00%> (+0.05%) ⬆️
end-to-end 77.43% <72.97%> (-0.02%) ⬇️
optional-components 43.16% <13.51%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
@codspeed-hq
Copy link

codspeed-hq bot commented Aug 20, 2025

CodSpeed Performance Report

Merging #3245 will not alter performance

Comparing 1606-feat-make-the-pagination-object-available-in-more-scopes-of-the-rest-stream-class (1f8c098) with main (f9b8921)

Summary

✅ 8 untouched benchmarks

@read-the-docs-community
Copy link

Documentation build overview

📚 Meltano SDK | 🛠️ Build #29271856 | 📁 Comparing 1f8c098 against latest (f9b8921)


🔍 Preview build

Show files changed (3 files in total): 📝 3 modified | ➕ 0 added | ➖ 0 deleted
File Status
genindex.html 📝 modified
classes/singer_sdk.GraphQLStream.html 📝 modified
classes/singer_sdk.RESTStream.html 📝 modified

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Make the pagination object available in more scopes of the REST stream class

2 participants