How To Summarise Data Over Several Years Into One Dataframe
Solution 1:
You need to call the .agg method on the groupby object. .agg stands for aggregate. You are essentially agreegating the data into one single observation. You can then pass a dictionary of functions to the agg that tells it what to do with each column. So imagine your data frame looked like this:
import pandas as pd
importrandomdf= pd.DataFrame({'business' : ['business_1', 'business_2', 'business_3', 'usiness_4', 'business_1', 'business_2', 'business_3', 'business_4'], \
'years' : [2013, 2013, 2013, 2013, 2014, 2014, 2014, 2014], \
'zip_code' : ['101', '102', '103', '104', '101', '102', '103', '104'], \
'profit' : [random.randint(1000, 2000) for x in xrange(8)]})
Now 'business' is like your id variable, zip_code is your data that do not change, and profit is the thing you want to sum.
You already know what function to use to get the sum, its sum. But you need to write a function to take the only unique value of zip code. You can do something like this:
deftake_single(series):
return series.unique()[0]
Now create your groupby object, create a dictionary of functions to be executed on each column, and pass that dictioanry to the .agg method (aggregate) like so:
df_grouped = df.groupby('business')
function_dict = {'business' : take_single, 'zip_code' : take_single, 'profit' : sum}
df_grouped.agg(function_dict)
This gets the result you want I think.
One thing to note is that the series of data which is having an aggregation function passed to it is automatically passed as the first argument of the function. Therefore in take_single function you see an argument called series. But this is argument is automatically passed when .agg is called, so there is no need to specifiy it within the function dictionary.
Sum is a built in function, so no need to write that one separately.
To replicate this, simply create the dictionary with B-Q keys with take_single value, and then R and Z with sum function value. Does that make sense?
Its not easy to understand groupby (for me anyway), but it is verty useful....
Rory
Post a Comment for "How To Summarise Data Over Several Years Into One Dataframe"