- 
                Notifications
    You must be signed in to change notification settings 
- Fork 367
Supports list-like Python objects for Series comparison. #2022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| Found a bug: >>> pser = pd.Series([1,2,3], index=[10,20,30])
>>> pser == [3, 2, 1]
10    False
20     True
30    False
dtype: boolwhereas: >>> kser = ks.Series([1,2,3], index=[10,20,30])
>>> kser == [3, 2, 1]
0     False
1     False
10    False
2     False
30    False
20    False
dtype: bool | 
| Codecov Report
 @@            Coverage Diff             @@
##           master    #2022      +/-   ##
==========================================
- Coverage   94.70%   93.18%   -1.53%     
==========================================
  Files          54       54              
  Lines       11480    11393      -87     
==========================================
- Hits        10872    10616     -256     
- Misses        608      777     +169     
 Continue to review full report at Codecov. 
 | 
| Btw, we might also want to support binary operations with list-like Python objects? cc @HyukjinKwon >>> pser + [3, 2, 1]
10    4
20    4
30    4
dtype: int64
>>> pser - [3, 2, 1]
10   -2
20    0
30    2
dtype: int64
>>> [3, 2, 1] + pser
10    4
20    4
30    4
dtype: int64 | 
| FYI: Seems like pandas has some inconsistent behavior as below. >>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> a.eq(b)
a     True
b    False
c    False
d    False
e    False
dtype: bool
>>> a == b
Traceback (most recent call last):
...
ValueError: Can only compare identically-labeled Series objectsHowever, in their API doc for  I posted question to pandas repo, and will share if they response. | 
| 
 Let me do this in the separated PR since there may be inconsistent cases like  | 
| The  | 
        
          
                databricks/koalas/series.py
              
                Outdated
          
        
      | def __eq__(self, other): | ||
| if isinstance(other, (list, tuple)): | ||
| other = ks.Index(other, name=self.name) | ||
| # pandas always returns False for all items with dict and set. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why pandas behaves like this ..
        
          
                databricks/koalas/series.py
              
                Outdated
          
        
      |  | ||
| equals = eq | ||
|  | ||
| def __eq__(self, other): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Index support this case too? it might be best to move to base.py.
…parison ### What changes were proposed in this pull request? This PR proposes to implement `Series` comparison with list-like Python objects. Currently `Series` doesn't support the comparison to list-like Python objects such as `list`, `tuple`, `dict`, `set`. **Before** ```python >>> psser 0 1 1 2 2 3 dtype: int64 >>> psser == [3, 2, 1] Traceback (most recent call last): ... TypeError: The operation can not be applied to list. ... ``` **After** ```python >>> psser 0 1 1 2 2 3 dtype: int64 >>> psser == [3, 2, 1] 0 False 1 True 2 False dtype: bool ``` This was originally proposed in databricks/koalas#2022, and all reviews in origin PR has been resolved. ### Why are the changes needed? To follow pandas' behavior. ### Does this PR introduce _any_ user-facing change? Yes, the `Series` comparison with list-like Python objects now possible. ### How was this patch tested? Unittests Closes #34114 from itholic/SPARK-36438. Authored-by: itholic <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
Currently Series doesn't support the comparison to list-like Python objects such as
list,tuple,dict,set.This PR proposes supporting them as well for Series comparison.
This should resolve #2018