fix: bytes scanned in query #1464

nikhilsinhaparseable · 2025-11-11T23:21:56Z

instead of using file_size from manifest -- which is size of json
we should use ingestion_size -- which is compressed size

Summary by CodeRabbit

Bug Fixes
- Updated file size metric source in table operations for more accurate byte count reporting.

instead of using file_size from manifest -- which is size of json we should use ingestion_size -- which is compressed size

coderabbitai · 2025-11-11T23:22:04Z

Walkthrough

The StandardTableProvider::partitioned_files method now uses ingestion_size instead of file_size for byte counting. The field destructuring and total file size aggregation have been updated to reference the new field, with no other logic changes.

Changes

Cohort / File(s)	Summary
Field source update `src/query/stream_schema_provider.rs`	Changed File destructuring and total_file_size aggregation to use `ingestion_size` field instead of `file_size` in StandardTableProvider::partitioned_files

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Verify that ingestion_size is the correct field for byte counting in this context (ensure semantic correctness)
Confirm no other references to file_size in related code paths need updating
Check if any documentation or comments reference the old behavior

Poem

A rabbit hops through field and stream,
Swapping sizes in the schema's dream,
From file_size to ingestion's call,
Counting bytes more fairly for all! 🐰📊

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description explains the change rationale but lacks required template sections: missing issue reference, incomplete description structure, and missing testing/documentation checkboxes.	Add issue reference (Fixes #XXXX), expand description with solution rationale, and include the template checklist for testing and documentation.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix: bytes scanned in query' directly matches the main change: correcting the byte counting metric used for query scanning.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8cfced3 and 31f0062.

📒 Files selected for processing (1)

src/query/stream_schema_provider.rs (1 hunks)

🧰 Additional context used

🧠 Learnings (5)

📚 Learning: 2025-08-18T19:10:11.941Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/handlers/http/ingest.rs:163-164
Timestamp: 2025-08-18T19:10:11.941Z
Learning: Field statistics calculation in src/storage/field_stats.rs uses None for the time_partition parameter when calling flatten_and_push_logs(), as field stats generation does not require time partition functionality.

Applied to files:

src/query/stream_schema_provider.rs

📚 Learning: 2025-09-18T09:59:20.177Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metrics/mod.rs:700-756
Timestamp: 2025-09-18T09:59:20.177Z
Learning: In src/event/mod.rs, the parsed_timestamp used in increment_events_ingested_by_date() is correctly UTC-normalized: for dynamic streams it remains Utc::now(), and for streams with time partition enabled it uses the time partition value. Both cases result in proper UTC date strings for metrics labeling, preventing double-counting issues.

Applied to files:

src/query/stream_schema_provider.rs

📚 Learning: 2025-08-25T01:31:41.786Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.

Applied to files:

src/query/stream_schema_provider.rs

📚 Learning: 2025-08-25T01:32:25.980Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.

Applied to files:

src/query/stream_schema_provider.rs

📚 Learning: 2025-09-18T09:52:07.554Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/storage/object_storage.rs:173-177
Timestamp: 2025-09-18T09:52:07.554Z
Learning: In Parseable's upload system (src/storage/object_storage.rs), the update_storage_metrics function can safely use path.metadata().map_err() to fail on local file metadata read failures because parquet validation (validate_uploaded_parquet_file) ensures file integrity before this step, and the system guarantees local staging files remain accessible throughout the upload flow.

Applied to files:

src/query/stream_schema_provider.rs

🧬 Code graph analysis (1)

src/query/stream_schema_provider.rs (1)

src/catalog/mod.rs (2)

ingestion_size (58-58)

ingestion_size (70-72)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: coverage
GitHub Check: Quest Smoke and Load Tests for Standalone deployments
GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: Build Default x86_64-pc-windows-msvc
GitHub Check: Build Default aarch64-apple-darwin
GitHub Check: Quest Smoke and Load Tests for Distributed deployments
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: Build Kafka aarch64-apple-darwin
GitHub Check: Build Default x86_64-apple-darwin
GitHub Check: Build Kafka x86_64-unknown-linux-gnu

🔇 Additional comments (1)

src/query/stream_schema_provider.rs (1)

342-342: Approve the metric tracking changes.

The updates to use ingestion_size for billing metrics correctly implement the PR objective of tracking compressed parquet file sizes instead of uncompressed JSON sizes.

Also applies to: 348-348

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

fix: bytes scanned in query

31f0062

instead of using file_size from manifest -- which is size of json we should use ingestion_size -- which is compressed size

coderabbitai bot approved these changes Nov 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: bytes scanned in query #1464

fix: bytes scanned in query #1464

Uh oh!

nikhilsinhaparseable commented Nov 11, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 11, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

fix: bytes scanned in query #1464

Are you sure you want to change the base?

fix: bytes scanned in query #1464

Uh oh!

Conversation

nikhilsinhaparseable commented Nov 11, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nikhilsinhaparseable commented Nov 11, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 11, 2025 •

edited

Loading