-
Notifications
You must be signed in to change notification settings - Fork 76
Add tests for the count function
#1530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
acfbdcf to
804822e
Compare
|
|
||
| @Test | ||
| fun `count on empty grouped dataframe`() { | ||
| emptyDf.groupBy("group").count().count() shouldBe 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do emptyDf.groupBy("group").count(), it returns a dataframe without the column count. Is it expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! We should at least get an empty column named count. This causes runtime exceptions with the compiler plugin, but I think it's an issue deep inside aggregation itself... I'll make an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! #1531
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe explain with a comment that it's empty for now and link to the issue like "Issue #1531". Makes it easier to find our discussion later on :)
804822e to
06b9e20
Compare
|
|
||
| class CountTests { | ||
|
|
||
| // Test data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These fields with test data are reinitialized for each test method, which creates some overhead (some fields are not used by some methods). But I think it should not be significant, it is still safe, and it saves from a lot of duplication. Is it reasonable to leave it as it is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just the tests, so it doesn't matter that much. In some cases we have TestBase so we can reuse these dataframes, but dataframes are not very heavy, so it most cases it's fine to have them duplicated. It does make it easier to see what kind of dataframe you're dealing with in the tests
| val groupedWithNulls = dfWithNulls.groupBy("group") | ||
| val pivotWithNulls = dfWithNulls.pivot("group") | ||
|
|
||
| // DataColumn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh btw, do you know region blocks?
If you write
// region DataColumn
...
// endregionthe regions are collapsible and they show up in the structure overview of the file
| fun `count on DataRow`() { | ||
| val row = df[0] | ||
| row.count() shouldBe 3 | ||
| (row.count { it is Number }) shouldBe 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary parentheses
Jolanrensen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice and extensive tests :)
Fixes #1496