Skip to content Skip to sidebar Skip to footer

Different Sequence Of Names With Pandas

I have dataframe used_at common users pair of websites 0 2014 1364 avito.ru and e1.ru 1 2014 1716

Solution 1:

Maybe before pivoting, try splitting on " and " and then sorting so every column is in the same order:

df['pair of websites'] = df['pair of websites'].str.split(' and ')
df['pair of websites'] = df['pair of websites'].apply(lambda x: frozenset(sorted(x)))

Seems like that should work as long as theres the same amount of whitespace in the " and " part for each entry. If not, you may have to use str.strip() as well.

Solution 2:

After testing I add inverted combinations c_invert, because some values were missing after pivot. Now there are all combination and pivot works very well:

df = pd.read_csv("avito_trend.csv", 
                      parse_dates=[2])


def f(df):
    dfs = []
    for x in [list(x) for x in itertools.combinations(df['address'].unique(), 2)]:

        c1 = df.loc[df['address'].isin([x[0]]), 'ID']
        c2 = df.loc[df['address'].isin([x[1]]), 'ID']
        c = pd.Series(list(set(c1).intersection(set(c2))))
        #add inverted intersection c2 vs c1
        c_invert = pd.Series(list(set(c2).intersection(set(c1))))
        dfs.append(pd.DataFrame({'common users':len(c), 'pair of websites':' and '.join(x)}, index=[0]))
        #swap values in x
        x[1],x[0] = x[0],x[1]
        dfs.append(pd.DataFrame({'common users':len(c_invert), 'pair of websites':' and '.join(x)}, index=[0]))
    return pd.concat(dfs)

common_users = df.groupby([df['used_at'].dt.year]).apply(f).reset_index(drop=True, level=1).reset_index()
print common_users.pivot(index='pair of websites', columns='used_at', values='common users')
used_at                              2014  2015
pair of websites                               
am.ru and auto.ru                     408   224
am.ru and avito.ru                    579   262
am.ru and avtomarket.ru               133    72
am.ru and cars.mail.ru/sale           166    73
am.ru and drom.ru                     394   187
am.ru and e1.ru                       224    99
am.ru and irr.ru/cars                 223   102
auto.ru and am.ru                     408   224
auto.ru and avito.ru                 1602  1473
auto.ru and avtomarket.ru             243   162
auto.ru and cars.mail.ru/sale         330   195
auto.ru and drom.ru                   874   799
auto.ru and e1.ru                     475   451
auto.ru and irr.ru/cars               409   288
avito.ru and am.ru                    579   262
avito.ru and auto.ru                 1602  1473
avito.ru and avtomarket.ru            299   205
avito.ru and cars.mail.ru/sale        424   256
avito.ru and drom.ru                 1716  1491
avito.ru and e1.ru                   1364  1153
avito.ru and irr.ru/cars              602   403
avtomarket.ru and am.ru               133    72
avtomarket.ru and auto.ru             243   162
avtomarket.ru and avito.ru            299   205
avtomarket.ru and cars.mail.ru/sale   105    48
avtomarket.ru and drom.ru             247   175
avtomarket.ru and e1.ru               139   105
avtomarket.ru and irr.ru/cars         139    73
cars.mail.ru/sale and am.ru           166    73
cars.mail.ru/sale and auto.ru         330   195
cars.mail.ru/sale and avito.ru        424   256
cars.mail.ru/sale and avtomarket.ru   105    48
cars.mail.ru/sale and drom.ru         292   189
cars.mail.ru/sale and e1.ru           154   105
cars.mail.ru/sale and irr.ru/cars     197    94
drom.ru and am.ru                     394   187
drom.ru and auto.ru                   874   799
drom.ru and avito.ru                 1716  1491
drom.ru and avtomarket.ru             247   175
drom.ru and cars.mail.ru/sale         292   189
drom.ru and e1.ru                     634   539
drom.ru and irr.ru/cars               423   277
e1.ru and am.ru                       224    99
e1.ru and auto.ru                     475   451
e1.ru and avito.ru                   1364  1153
e1.ru and avtomarket.ru               139   105
e1.ru and cars.mail.ru/sale           154   105
e1.ru and drom.ru                     634   539
e1.ru and irr.ru/cars                 235   148
irr.ru/cars and am.ru                 223   102
irr.ru/cars and auto.ru               409   288
irr.ru/cars and avito.ru              602   403
irr.ru/cars and avtomarket.ru         139    73
irr.ru/cars and cars.mail.ru/sale     197    94
irr.ru/cars and drom.ru               423   277
irr.ru/cars and e1.ru                 235   148

If you need graph:

graph_by_common_users = common_users.pivot(index='pair of websites', columns='used_at', values='common users')
#sort by column 2014
graph_by_common_users = graph_by_common_users.sort_values(2014, ascending=False)



ax = graph_by_common_users.plot(kind='barh', width=0.5, figsize=(10,20))
[label.set_rotation(25) for label in ax.get_xticklabels()]


rects = ax.patches 
labels = [int(round(graph_by_common_users.loc[i, y])) for y in graph_by_common_users.columns.tolist() for i in graph_by_common_users.index] 
for rect, label inzip(rects, labels): 
    height = rect.get_height() 
    ax.text(rect.get_width() + 3, rect.get_y() + rect.get_height(), label, fontsize=8) 

Post a Comment for "Different Sequence Of Names With Pandas"