Skip to content

Conversation

@Allex-Nik
Copy link
Collaborator

@Allex-Nik Allex-Nik commented Oct 29, 2025

Fixes #1496


@Test
fun `count on empty grouped dataframe`() {
emptyDf.groupBy("group").count().count() shouldBe 0
Copy link
Collaborator Author

@Allex-Nik Allex-Nik Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do emptyDf.groupBy("group").count(), it returns a dataframe without the column count. Is it expected?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! We should at least get an empty column named count. This causes runtime exceptions with the compiler plugin, but I think it's an issue deep inside aggregation itself... I'll make an issue

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! #1531

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe explain with a comment that it's empty for now and link to the issue like "Issue #1531". Makes it easier to find our discussion later on :)


class CountTests {

// Test data
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fields with test data are reinitialized for each test method, which creates some overhead (some fields are not used by some methods). But I think it should not be significant, it is still safe, and it saves from a lot of duplication. Is it reasonable to leave it as it is?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just the tests, so it doesn't matter that much. In some cases we have TestBase so we can reuse these dataframes, but dataframes are not very heavy, so it most cases it's fine to have them duplicated. It does make it easier to see what kind of dataframe you're dealing with in the tests

val groupedWithNulls = dfWithNulls.groupBy("group")
val pivotWithNulls = dfWithNulls.pivot("group")

// DataColumn
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh btw, do you know region blocks?

If you write

// region DataColumn
...
// endregion

the regions are collapsible and they show up in the structure overview of the file

fun `count on DataRow`() {
val row = df[0]
row.count() shouldBe 3
(row.count { it is Number }) shouldBe 2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary parentheses

Copy link
Collaborator

@Jolanrensen Jolanrensen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice and extensive tests :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add unit tests for count function

3 participants