Skip to content Skip to sidebar Skip to footer

Pandas: Find Most Common String Per Person

I would like find the most common string value in animal when aggregating data by id, if the count is the same, I would like to pick the last value of animal. id animal

Solution 1:

group by id & animal columns and get the count and last date on which they appeared.

then sort the resulting data frame by id, count, last and drop duplicate values on id, keeping the last row, which due to our ordering, will give the most common animal, and if there are two animals, the animal that was last observed in the table. finally, get rid of the extra columns count & last

columns = ['id', 'animal']

df2 = df.groupby(columns).date.agg(['count', 'last']).reset_index()
df3 = df2.sort_values(['id', 'count', 'last'])
df3.drop_duplicates('id', keep='last')[columns]

# outputs:id animal
1   1    dog
2   2    cat
3   3    dog
4   4   fish
5   5    cat

Solution 2:

You can define your custom rule and aggregate using it

from collections import Counter
def rule(a):
    m = Counter(a)
    max_val = sorted(m.values())[-1]
    return max(a) if m.values().count(max_val) == 1 else a.tail(1).item()

df.groupby("id").aggregate(rule)

Output:

   animal
id  
1   dog
2   cat
3   dog
4   fish
5   cat

Post a Comment for "Pandas: Find Most Common String Per Person"