Skip to content

Conversation

@TheNamesRai
Copy link
Contributor

@TheNamesRai TheNamesRai commented Oct 22, 2025

PHOENIX-7524: Fix IndexOutOfBoundsException in queries with OFFSET when rows are exhausted

Design doc - link

When executing queries with OFFSET where the available rows are exhausted:

  • OFFSET exceeds the total number of rows in the table, OR
  • OFFSET exceeds rows returned by WHERE clause (even when WHERE matches some rows), OR
  • WHERE clause filters all rows, OR
  • Table/region is empty

The server-side scanner would encounter an empty tuple after exhausting all available rows and attempt to call getOffsetKvWithLastScannedRowKey(), which internally calls tuple.getKey(). This caused IndexOutOfBoundsException in MultiKeyValueTuple.getKey() when accessing an empty tuple.

Solution - Added a check in NonAggregateRegionScannerFactory.java to detect empty tuples before calling getOffsetKvWithLastScannedRowKey()

Modified Files:

  • NonAggregateRegionScannerFactory.java
    • Added empty tuple check before accessing tuple.getKey()
    • Implemented fallback logic to derive appropriate row key from scan boundaries

Test Files:

  • QueryWithOffsetIT.java
    • Added 5 comprehensive integration tests covering various scenarios where OFFSET exceeds available rows
  • CDCQueryIT.java
    • Added CDC-specific test for OFFSET exceeding available rows

@virajjasani virajjasani self-requested a review October 22, 2025 16:51
@virajjasani virajjasani requested a review from palashc October 23, 2025 04:53
@TheNamesRai
Copy link
Contributor Author

TheNamesRai commented Oct 23, 2025

* @param region The region being scanned
* @return A valid row key derived from scan or region boundaries
*/
public static byte[] deriveRowKeyFromScanOrRegionBoundaries(Scan scan, Region region) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming convention for this method is similar to getScanStartRowKeyFromScanOrRegionBoundaries method in the ServerUtil.java

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use getScanStartRowKeyFromScanOrRegionBoundaries() instead of this new method?

Copy link
Contributor

@palashc palashc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1, thank you!

Comment on lines +314 to +315
} else if (scan.includeStopRow()) {
rowKey = endKey;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this generates correct rowkey: if scan start rowkey is not inclusive, we need to find the shortest possible next rowkey?
I think we should have this logic elsewhere, @TheNamesRai could you please check once?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants