Skip to content Skip to sidebar Skip to footer

Find The Duplicate Rows Of One Column Then Add The Corresponding Rows Of Other Columns

I want to check the duplicate rows of one column and add the corresponding rows of other columns. If the dateframe is as follows: A B C D E F G 13348 xyz

Solution 1:

I think need:

cols = ['D','E','F','G']
#foreachgroup transpose df andcheck if all duplicates
df1 = df.groupby('A')[cols].apply(lambda x: x.T.duplicated(keep=False))
#for duplicates aggregate sum else0
arr = np.where(df1.all(axis=1), df.groupby('A')[cols[0]].sum(), 0)
#remove unnecessary columns andaddnew, getfirstrowspercolumn A
df = df.drop(cols, axis=1).drop_duplicates('A').assign(D=arr)
print (df)
        A        B        C  D
013348    xyzqr   3245805245832  gberthh   2587290458712    bgrtw   9845622576493     hzrt   63849506643509        .  T648501  2

Alternative solution with check each group if all values are dupes:

cols = ['D','E','F','G']
m = df.groupby('A')[cols].apply(lambda x: x.T.duplicated(keep=False).all())
print (m)
A
13348True45832False
dtype: bool

arr = np.where(m, df.groupby('A')[cols[0]].sum(), 0)
df = df.drop(cols, axis=1).drop_duplicates('A').assign(D=arr)
print (df)
        A        B        C  D
013348    xyzqr   3245805245832  gberthh   2587290458712    bgrtw   9845622576493     hzrt   63849506643509        .  T648501  2

Post a Comment for "Find The Duplicate Rows Of One Column Then Add The Corresponding Rows Of Other Columns"