Skip to content

Conversation

@iamrohitrc
Copy link
Collaborator

No description provided.

Copy link
Collaborator

@venkatesh-sivaraman venkatesh-sivaraman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned before, looks great! No rush on this but here are a few suggestions for how to clean this up and merge:

  1. I think there might be a small typo - in the last cell, line 11, I got an error until I changed it to max_score_slice = max_score_slice & (discrete_df[features[index]] == combination[index])
  2. The recursion you wrote is correct, but it might create memory problems later on when we run it on data with hundreds of features. Could you look into refactoring this to use Python iterators instead of returning lists? For example, you can use the yield keyword to return items one by one from a function, and the yield from keyword to yield items from another iterator. Then you can simply loop over the results of the function without casting it to a list, and you'll never have to store the entire set of combinations in memory. I see you've used itertools.product, which conveniently also returns an iterator (but you'd have to remove the list() casting to take advantage of it).
  3. On the same vein, it might be good to only save the top K results instead of saving all of them. I have a function that lets you add scored items to a ranked list, which you can add to your implementation:
def add_to_ranking(ranked_list, item, k, score_idx=-1):
    """
    Adds an item to a ranked list with a maximum of k
    elements. Mutates the original list object.
    
    Args:
        ranked_list: A list of tuples containing ranked
            items.
        item: A tuple representing a new item to add, 
            where the score for the ranking is at index 
            score_idx.
        k: The maximum number of items in the ranked list.
        score_idx: The index of the score value in the tuples
            in the existing ranked_list and the new item.
            
    Returns: the updated ranked list.
    """
    added = False
    for i, comp_item in enumerate(ranked_list):
        if item[score_idx] > comp_item[score_idx]:
            ranked_list.insert(i, item)
            added = True
            break
    if not added and len(ranked_list) < k: ranked_list.append(item)
    return ranked_list[:k]
  1. Once you've done all of those things and confirmed that the code still works, could you move the reusable slice finding code to a Python module (named something like recursive.py)? For the code that's at the top level of the script (where you call populate_slices and then sort), you could wrap that in a new function and then write a docstring for it in the same style as I did with the function above. And you can put any code that might be useful across implementations, like the add_to_ranking and Slice definitions, in a utils.py module so we can easily reuse it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants