Skip to content Skip to sidebar Skip to footer

Converting Long Table To Wide And Creating Columns According To The Rows

I have a data frame that look like this: Customer_ID Category Products 1 Veg A 2 Veg B 3 Fruit A 3

Solution 1:

Use groupby with unstack, but if duplicates rows data are concanecate together:

df = df.groupby(['Customer_ID','Category','Products'])['Products'].sum().unstack()
df.columns = ['Pro_{}'.format(x) for x inrange(1, len(df.columns)+1)]
df = df.reset_index()
print (df)
   Customer_ID Category Pro_1 Pro_2 Pro_3
01    Fruit     A  None     C
11      Veg     A  NoneNone22      Veg  None     B  None33    Fruit     A     B  None43      Veg  None     B     C

Another solution with helper column, triples has to be unique:

#if not unique triples remove duplicates
df = df.drop_duplicates(['Customer_ID','Category','Products'])

df['a'] = df['Products']
df = df.set_index(['Customer_ID','Category','Products'])['a'].unstack()
df.columns = ['Pro_{}'.format(x) for x inrange(1, len(df.columns)+1)]
df = df.reset_index()
print (df)
   Customer_ID Category Pro_1 Pro_2 Pro_3
01    Fruit     A  None     C
11      Veg     A  NoneNone22      Veg  None     B  None33    Fruit     A     B  None43      Veg  None     B     C

Solution 2:

Another option using crosstab :

pd.crosstab([df['Customer_ID'],df['Category']], df['Products'])

output:

Products              AB  C
Customer_ID Category         
1           Fruit     101
            Veg       1002           Veg       0103           Fruit     110
            Veg       011

Afterwards you can reset the index for a similar solution to what you wanted.

df = df.reset_index()
Products  Customer_ID Category  A  B  C
0                   1    Fruit  1  0  1
1                   1      Veg  1  0  0
2                   2      Veg  0  1  0
3                   3    Fruit  1  1  0
4                   3      Veg  0  1  1

Solution 3:

Try this: (don't mind the IO thing it is just for simple copy/paste)

import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO("""
Customer_ID     Category   Products 
  1               Veg         A
  2               Veg         B
  3              Fruit        A   
  3              Fruit        B
  3               Veg         B 
  1              Fruit        A
  3               Veg         C 
  1              Fruit        C"""), sep='\s+')
df = df.join(pd.get_dummies(df['Products']))
g = df.groupby(['Customer_ID', 'Category']).sum()
print(g)

output:

AB  C
Customer_ID Category         
1           Fruit     101
            Veg       1002           Veg       0103           Fruit     110
            Veg       011

Post a Comment for "Converting Long Table To Wide And Creating Columns According To The Rows"