Skip to content Skip to sidebar Skip to footer

Pandas Dataframe Selecting Groups With Minimal Cardinality

I have a problem where I need to take groups of rows from a data frame where the number of items in a group exceeds a certain number (cutoff). For those groups, I need to take some

Solution 1:

Use groupby/filter:

>>> df.groupby('id').filter(lambda x: len(x) > cutoff)

This will just return the rows of your dataframe where the size of the group is greater than your cutoff. Also, it should perform quite a bit better. I timed filter here with a dataframe with 30,039 'id' groups and a little over 4 million observations:

In [9]: %timeit df.groupby('id').filter(lambda x: len(x) > 12)
1 loops, best of 3: 12.6 s per loop

Post a Comment for "Pandas Dataframe Selecting Groups With Minimal Cardinality"