Converting Long Table To Wide And Creating Columns According To The Rows
I have a data frame that look like this: Customer_ID Category Products 1 Veg A 2 Veg B 3 Fruit A 3
Solution 1:
Use groupby
with unstack
, but if duplicates rows data are concanecate together:
df = df.groupby(['Customer_ID','Category','Products'])['Products'].sum().unstack()
df.columns = ['Pro_{}'.format(x) for x inrange(1, len(df.columns)+1)]
df = df.reset_index()
print (df)
Customer_ID Category Pro_1 Pro_2 Pro_3
01 Fruit A None C
11 Veg A NoneNone22 Veg None B None33 Fruit A B None43 Veg None B C
Another solution with helper column, triples has to be unique:
#if not unique triples remove duplicates
df = df.drop_duplicates(['Customer_ID','Category','Products'])
df['a'] = df['Products']
df = df.set_index(['Customer_ID','Category','Products'])['a'].unstack()
df.columns = ['Pro_{}'.format(x) for x inrange(1, len(df.columns)+1)]
df = df.reset_index()
print (df)
Customer_ID Category Pro_1 Pro_2 Pro_3
01 Fruit A None C
11 Veg A NoneNone22 Veg None B None33 Fruit A B None43 Veg None B C
Solution 2:
Another option using crosstab
:
pd.crosstab([df['Customer_ID'],df['Category']], df['Products'])
output:
Products AB C
Customer_ID Category
1 Fruit 101
Veg 1002 Veg 0103 Fruit 110
Veg 011
Afterwards you can reset the index for a similar solution to what you wanted.
df = df.reset_index()
Products Customer_ID Category A B C
0 1 Fruit 1 0 1
1 1 Veg 1 0 0
2 2 Veg 0 1 0
3 3 Fruit 1 1 0
4 3 Veg 0 1 1
Solution 3:
Try this: (don't mind the IO thing it is just for simple copy/paste)
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO("""
Customer_ID Category Products
1 Veg A
2 Veg B
3 Fruit A
3 Fruit B
3 Veg B
1 Fruit A
3 Veg C
1 Fruit C"""), sep='\s+')
df = df.join(pd.get_dummies(df['Products']))
g = df.groupby(['Customer_ID', 'Category']).sum()
print(g)
output:
AB C
Customer_ID Category
1 Fruit 101
Veg 1002 Veg 0103 Fruit 110
Veg 011
Post a Comment for "Converting Long Table To Wide And Creating Columns According To The Rows"