Skip to content Skip to sidebar Skip to footer

How To Take Items In An Index As Columns In Pandas

I have a pandas data-frame where the information in the index are headers for each column. This one is tricky because each row has data that the other rows may not have. I am tryin

Solution 1:

You could create a grouping variable, then reshape using pivot

df.assign(grp=df.iloc[:,0].str.contains('address').cumsum()).pivot('grp','INDEX','INFO')
Out: 
INDEX              address          name     phone      type    website
grp                                                                    
1      2. 123 APPLE STREET   APPLE STORE  555-5555  BUSINESS  APPLE.COM
2            456 peach ave   PEACH STORE  777-7777  BUSINESS  PEACH.COM
3            789 banana rd  banana store  999-9999  BUSINESS        NaN

how your df looks like:

     INDEX                 INFO
0   address  2. 123 APPLE STREET
1     phone             555-5555
2      name          APPLE STORE
3   website            APPLE.COM
4      type             BUSINESS
5   address        456 peach ave
6     phone             777-7777
7      name          PEACH STORE
8   website            PEACH.COM
9      type             BUSINESS
10  address        789 banana rd
11    phone             999-9999
12     name         banana store
13     type             BUSINESS

Solution 2:

It is pivoting table. I use unstack. As in your comment, I guess your INDEX isn't a column. It is the index of the dataframe, so I change the code accordingly.

s = df.groupby('INDEX').cumcount()
df_out = df.set_index(s, append=True).INFO.unstack(0, fill_value='None')

Out[111]:
INDEX           address          name     phone      type    website
0      123 APPLE STREET   APPLE STORE  555-5555  BUSINESS  APPLE.COM
1         456 peach ave   PEACH STORE  777-7777  BUSINESS  PEACH.COM
2         789 banana rd  banana store  999-9999  BUSINESS       None

Note: since you want None, I fill NaN with string None. If you want python None, you should just leave it as NaN because they work same way. If you want NaN, take out the option fill_value='None'


Solution 3:

This should do the trick,


    import pandas as pd

    INDEX = ['address', 'phone', 'name', 'website', 'type', 'address', 'phone', 'name', 'website', 'type', 'address', 'phone', 'name', 'type']
    INFO = ['123 APPLE STREET', '555-5555', 'APPLE STORE', 'APPLE.COM', 'BUSINESS', '456 peach ave', '777-7777', 'PEACH STORE', 'PEACH.COM', 'BUSINESS', '789 banana rd', '999-9999', 'banana store', 'BUSINESS']
    df = pd.DataFrame(index=INDEX, data=INFO, columns=['INFO'])
    df.index.name = 'INDEX'
    df2 = df.groupby('INDEX').agg(INFO=('INFO', list))
    pd.DataFrame(df2['INFO'].to_list(), index=df2.index).transpose()

Here's the output you get,


    Out[132]: 
    INDEX           address          name     phone      type    website
    0      123 APPLE STREET   APPLE STORE  555-5555  BUSINESS  APPLE.COM
    1         456 peach ave   PEACH STORE  777-7777  BUSINESS  PEACH.COM
    2         789 banana rd  banana store  999-9999  BUSINESS       None


Solution 4:

I figured out the issue. The majority of the answers can accomplish this task. however there was a bug in the dataframe. It was still giving me an error of a list no matter what I did, so I did something that was unorthodox in Python. I saved the PDF as an excel sheet and bring it back to a pandas data frame. Once I did that, the traceback disappear. Weird huh? The bigger question is to how to prevent it from happening. But thank you for all your responses.


Post a Comment for "How To Take Items In An Index As Columns In Pandas"