Skip to content

Conversation

@rafalkrupinski
Copy link

No description provided.

Copy link
Owner

@mahmoud mahmoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looking good, would also be good to see an extra test or two for edge cases in test_iterutils.py. Thanks again for this!

return


def chunked_filter(iterable, predicate, chunk_size):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see a docstring here. Looking around at other functions, I noticed that most refer to the predicate as key. Do you mind updating this argument to be consistent?

Copy link
Author

@rafalkrupinski rafalkrupinski Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see various names and since it's not accessing a property of passed objects, I'd find key in this context confusing. builtins.filter uses iterable and function. Actually it's filter(function, iterable) (reverse order). Since it's a variation of that, how about using these names and order?

chunked_filter(function, iterable, size)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think the prime reference for this should be the sorted() function, where key is used. min() and max() also use key, I think. So, tldr. key should be preferred here. :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think the prime reference for this should be the sorted() function, where key is used.

There's a good reason why sorted and min/max use key and filter & filterfalse don't. When you compare two objects, you either compare the value itself or its part, hence the key - attribute, only shorter. When you filter a collection, it can be an arbitrary criterium that's external to the object, so you use function (...to be called on an item) or more specifically, predicate.
Also, since it's an extension of builtins.filter, what's the reason for following naming pattern of any other function?

@rafalkrupinski
Copy link
Author

builtins.filter allows None function, in which case an identity function is used. Here it doesn't make much sense from the use case perspective, but perhaps it would from the usability perspective and for consistency?

Copy link
Owner

@mahmoud mahmoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good progress! I like all the edge case handling. I agree that there's no harm in allowing key to be None, except that then you may have to come up with a default for size, based on the ordering. I'm fine either way, so I'll let you make the call. :)


def chunked_filter(iterable, predicate, chunk_size):
"""A version of :func:`filter` which will call predicate with a chunk of the iterable.
def chunked_filter(iterable, predicate, size):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, based on the docstring, I think you meant to rename predicate here to key.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure, see previous comments

iterable (Iterable): Items to filter
predicate (Callable): Predicate function
chunk_size (int): The maximum size of chunks that will be passed the
predicate (Callable): Bulk predicate function that accepts a list of items
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

predicate -> key to match description above?


allow_list = list(allow_iter)
if len(allow_list) != len(src_):
raise ValueError('expected the iterable from key(src) has the same length as the passed chunk of items')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"has the same" -> "to have the same"

I think it's great you're thinking about this. For exceptions, these days, I usually recommend the following format to maximize debuggability:

raise ValueError("chunked_filter expected key func to return an iterable of length {src_len}, got {allow_list_len}").

Similar changes could be made to other exception messages above just by adding ", not {actually_received_type}"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants