Skip to content

Conversation

@thomasrebele
Copy link
Contributor

What changes were proposed in this pull request?

I've updated the metastore dump with the export from a TPC-DS 30TB database with histogram. I've also regenerated the qfiles with org.apache.hadoop.hive.cli.TestTezTPCDS30TBPerfCliDriver.

Why are the changes needed?

The PR makes it possible to test the planning phase with histograms.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Execute TestTezTPCDS30TBPerfCliDriver locally.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Nov 5, 2025

@deniskuzZ
Copy link
Member

deniskuzZ commented Nov 5, 2025

omg 👍 , should we even bother reviewing q.out changes?

@thomasrebele
Copy link
Contributor Author

I've investigated the changes to the q.out file for ql/src/test/queries/clientpositive/perf/query96.q. The NUM_DISTINCT varies slightly between the old and the new metastore dump (due to the randomness of the HLL algorithm). That influences the selectivity of certain predicates, which lead to different row count estimations. Sometimes they change enough to influence the join order.

The new metastore is an improvement. Besides the histogram statistics, it fixes the statistics for NUM_NULLS, which are very often 0 in the old dump.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants