Skip to content Skip to sidebar Skip to footer

Pandas: Grouping A Column On A Value And Creating New Column Headings

I am trying to re-arrange a pandas dataframe. Presently I have: id mins param 1 10 0.15 1 11 0.16 1 12 0.17 2 10 0.20 2 11 0.21 2 12 0.22 But I wo

Solution 1:

Use pivot or unstack with set_index and last add_prefix:

df = df.pivot(index='id', columns='mins', values='param').add_prefix('param_')
print (df)
mins  param_10  param_11  param_12
id                                
1         0.15      0.16      0.17
2         0.20      0.21      0.22

df = df.set_index(['id','mins'])['param'].unstack().add_prefix('param_')
print (df)
mins  param_10  param_11  param_12
id                                
1         0.15      0.16      0.17
2         0.20      0.21      0.22

If error:

ValueError: Index contains duplicate entries, cannot reshape

it means duplicates, need aggregation. Use pivot_table or groupby with function like mean ( sum, median, ...) and last unstack:

print (df)
   id  mins  param
0   1    10   0.15 <- id mins dupe
1   1    10   0.50 <- id mins dupe
2   1    11   0.16
3   1    12   0.17
4   2    10   0.20
5   2    11   0.21
6   2    12   0.22

df = df.pivot_table(index='id', columns='mins', values='param', aggfunc='mean')
       .add_prefix('param_')
print (df)
mins  param_10  param_11  param_12
id                                
1        0.325      0.16      0.17 <- (0.15+0.5)/2 = 0.325
2        0.200      0.21      0.22

df = df.groupby(['id','mins'])['param'].mean().unstack().add_prefix('param_')
print (df)
mins  param_10  param_11  param_12
id                                
1        0.325      0.16      0.17 <- (0.15+0.5)/2 = 0.325
2        0.200      0.21      0.22

For cleaning set columns name to None by rename_axis and use reset_index for column id:

df = df.rename_axis(None, axis=1).reset_index()
print (df)
   id  param_10  param_11  param_12
0   1     0.325      0.16      0.17
1   2     0.200      0.21      0.22

EDIT:

Solution with multiple columns:

df['param1'] = df['param'] / 4
print (df)
   id  mins  param  param1
0   1    10   0.15  0.0375
1   1    10   0.50  0.1250
2   1    11   0.16  0.0400
3   1    12   0.17  0.0425
4   2    10   0.20  0.0500
5   2    11   0.21  0.0525
6   2    12   0.22  0.0550

df = df.pivot_table(index='id', columns='mins', values=['param', 'param1'], aggfunc='mean')
df.columns = ['_'.join((x[0], str(x[1]))) for x in df.columns]
print (df)
    param_10  param_11  param_12  param1_10  param1_11  param1_12
id                                                               
1      0.325      0.16      0.17    0.08125     0.0400     0.0425
2      0.200      0.21      0.22    0.05000     0.0525     0.0550

Post a Comment for "Pandas: Grouping A Column On A Value And Creating New Column Headings"