Skip to content Skip to sidebar Skip to footer

Pandas Groupby Date Range Depending On Each Row

I would like to do the following: for a data frame that looks like this: df = pd.DataFrame({'ID':['A', 'A', 'C' ,'B', 'B'], 'date':['06/24/2014','06/25/2014','06/23/2014','07/02/19

Solution 1:

You could sort the rows by date, then take the difference between consecutive dates. Test when the difference is greater than 2 days. Taking the cumulative sum assigns the desired group numbers:

import pandas as pd
df = pd.DataFrame({"ID":["A", "A", "C" ,"B", "B"], "date":["06/24/2014","06/25/2014","06/23/2014","07/02/1999","07/02/1999"], "value": ["3","5","1","7","8"] })
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values(by='date')
df['group'] = (df['date'].diff() > pd.Timedelta(days=2)).cumsum()
print(df)

yields

IDdatevaluegroup3B1999-07-02     704B1999-07-02     802C2014-06-23     110A2014-06-24     311A2014-06-25     51

Post a Comment for "Pandas Groupby Date Range Depending On Each Row"