-
Couldn't load subscription status.
- Fork 374
Add chunked_filter (#344) #346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add chunked_filter (#344) #346
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looking good, would also be good to see an extra test or two for edge cases in test_iterutils.py. Thanks again for this!
boltons/iterutils.py
Outdated
| return | ||
|
|
||
|
|
||
| def chunked_filter(iterable, predicate, chunk_size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see a docstring here. Looking around at other functions, I noticed that most refer to the predicate as key. Do you mind updating this argument to be consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see various names and since it's not accessing a property of passed objects, I'd find key in this context confusing. builtins.filter uses iterable and function. Actually it's filter(function, iterable) (reverse order). Since it's a variation of that, how about using these names and order?
chunked_filter(function, iterable, size)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interestingly itertools.filterfalse(predicate, iterable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think the prime reference for this should be the sorted() function, where key is used. min() and max() also use key, I think. So, tldr. key should be preferred here. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think the prime reference for this should be the
sorted()function, wherekeyis used.
There's a good reason why sorted and min/max use key and filter & filterfalse don't. When you compare two objects, you either compare the value itself or its part, hence the key - attribute, only shorter. When you filter a collection, it can be an arbitrary criterium that's external to the object, so you use function (...to be called on an item) or more specifically, predicate.
Also, since it's an extension of builtins.filter, what's the reason for following naming pattern of any other function?
|
builtins.filter allows None function, in which case an identity function is used. Here it doesn't make much sense from the use case perspective, but perhaps it would from the usability perspective and for consistency? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good progress! I like all the edge case handling. I agree that there's no harm in allowing key to be None, except that then you may have to come up with a default for size, based on the ordering. I'm fine either way, so I'll let you make the call. :)
|
|
||
| def chunked_filter(iterable, predicate, chunk_size): | ||
| """A version of :func:`filter` which will call predicate with a chunk of the iterable. | ||
| def chunked_filter(iterable, predicate, size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, based on the docstring, I think you meant to rename predicate here to key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure, see previous comments
| iterable (Iterable): Items to filter | ||
| predicate (Callable): Predicate function | ||
| chunk_size (int): The maximum size of chunks that will be passed the | ||
| predicate (Callable): Bulk predicate function that accepts a list of items |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
predicate -> key to match description above?
|
|
||
| allow_list = list(allow_iter) | ||
| if len(allow_list) != len(src_): | ||
| raise ValueError('expected the iterable from key(src) has the same length as the passed chunk of items') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"has the same" -> "to have the same"
I think it's great you're thinking about this. For exceptions, these days, I usually recommend the following format to maximize debuggability:
raise ValueError("chunked_filter expected key func to return an iterable of length {src_len}, got {allow_list_len}").
Similar changes could be made to other exception messages above just by adding ", not {actually_received_type}"
No description provided.