Skip to content

Conversation

@LSturtew
Copy link
Contributor

@LSturtew LSturtew commented Apr 8, 2021

ref #1929

Implement DataFrame.cov

>>> kdf = ks.DataFrame([(1, 2), (0, 3), (2, 0), (1, 1)],
...                   columns=['dogs', 'cats'])
>>> kdf.cov()
                  dogs      cats
        dogs  0.666667 -1.000000
        cats -1.000000  1.666667

@codecov-io
Copy link

Codecov Report

Merging #2142 (7987192) into master (d7f6e88) will decrease coverage by 2.23%.
The diff coverage is 85.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2142      +/-   ##
==========================================
- Coverage   95.37%   93.14%   -2.24%     
==========================================
  Files          60       60              
  Lines       13694    13601      -93     
==========================================
- Hits        13060    12668     -392     
- Misses        634      933     +299     
Impacted Files Coverage Δ
databricks/koalas/missing/frame.py 74.57% <ø> (-25.43%) ⬇️
databricks/koalas/frame.py 95.65% <85.71%> (-0.84%) ⬇️
databricks/koalas/usage_logging/__init__.py 28.20% <0.00%> (-64.36%) ⬇️
databricks/koalas/usage_logging/usage_logger.py 47.82% <0.00%> (-52.18%) ⬇️
databricks/koalas/missing/series.py 60.56% <0.00%> (-39.44%) ⬇️
databricks/koalas/__init__.py 80.26% <0.00%> (-11.85%) ⬇️
databricks/koalas/missing/indexes.py 88.63% <0.00%> (-11.37%) ⬇️
databricks/conftest.py 89.09% <0.00%> (-10.91%) ⬇️
databricks/koalas/typedef/typehints.py 86.22% <0.00%> (-9.19%) ⬇️
... and 30 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d7f6e88...7987192. Read the comment docs.

]
kdf = self[num_cols]
names = [name for t in num_cols for name in t]
mat = kdf.to_pandas().to_numpy(dtype=float, copy=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid using to_pandas() without any restriction is not a good idea. It will cause OOM if the data side doesn't fit in a driver's memory.

@xinrong-meng
Copy link
Contributor

xinrong-meng commented Aug 3, 2021

Hi @LSturtew, since Koalas has been ported to Spark as pandas API on Spark, would you like to migrate this PR to the Spark repository? Here is the ticket https://issues.apache.org/jira/browse/SPARK-36396. Otherwise, I may do that for you next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants