Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Aug 25, 2025

This PR implements automatic pagination functionality for the CopernicusDataSearcher class to handle large datasets that exceed the OData API's 1000 result limit per request.

Problem

Previously, the searcher could only retrieve a maximum of 1000 results per query, even when the total available results exceeded this limit. Users had no way to access the complete dataset when searches returned more than 1000 products.

Solution

Added automatic pagination that triggers when count=True and the total result count exceeds the top parameter. The implementation:

  • Detects large datasets: When @odata.count > top, automatically initiates pagination
  • Uses OData $skip parameter: Makes sequential requests with $skip=1000, $skip=2000, etc.
  • Combines all results: Merges data from all paginated requests into a single DataFrame
  • Maintains backward compatibility: Existing code continues to work unchanged

Usage

# Enable pagination for complete dataset retrieval
searcher = CopernicusDataSearcher()
searcher.query_by_filter(
    collection_name='SENTINEL-1',
    product_type='GRD',
    start_date='2022-05-03T00:00:00.000Z',
    end_date='2022-05-03T12:00:00.000Z',
    count=True  # Enables automatic pagination
)

# Automatically retrieves ALL results, not just first 1000
df = searcher.execute_query()  # Could return 2500+ results via multiple API calls

Implementation Details

The solution adds a _execute_paginated_query() helper method that:

  • Uses the existing top parameter as the page size (default: 1000)
  • Constructs paginated URLs with appropriate $skip values
  • Handles network errors gracefully by returning partial results
  • Preserves all existing query filters and sorting

Example paginated requests generated:

GET /Products?$filter=...&$top=1000&$count=true
GET /Products?$filter=...&$top=1000&$skip=1000
GET /Products?$filter=...&$top=1000&$skip=2000

Testing

Added comprehensive test suite covering:

  • Pagination triggering logic
  • Correct $skip parameter usage
  • Backward compatibility preservation
  • Error handling during pagination
  • Large dataset scenarios (2500+ results)

All existing functionality remains unchanged when count=False (default behavior).


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: sirbastiano <71963566+sirbastiano@users.noreply.github.com>
Copilot AI changed the title [WIP] Coming soon: Implement a count option for search results more than 1000. In case the count is more than 1000, you will have to lunch more queries with the pagin mechanism: paginating through the results like here:: https://catalogue.dataspace.coperni... Implement automatic pagination for search results exceeding 1000 items Aug 25, 2025
Copilot AI requested a review from sirbastiano August 25, 2025 14:03
… e modifica i parametri di ricerca nel notebook
@sirbastiano sirbastiano marked this pull request as ready for review January 18, 2026 12:17
Copy link
Collaborator

@sirbastiano sirbastiano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

@sirbastiano sirbastiano merged commit 497a51a into main Jan 18, 2026
1 check passed
@sirbastiano sirbastiano deleted the copilot/fix-a5c4f09d-5aed-4188-8b33-d9ae22bd2177 branch January 18, 2026 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants