Group By And Concatenate Dataframe
I have df with frame, m_label, and details so all of them can be duplicated, in same frame may be different labels with different details, but you need to know m_label+details have
Solution 1:
First I find it useful to move the start/end info to a new column, which is done by merging together the rows that have start/end on one side and the ones that don’t on the other:
>>> detail_type = df['details'].isin({'start', 'end'})
>>> df = pd.merge(df[~detail_type], df[detail_type].rename(columns={'details': 'detail_type'}))
>>> df
frame m_label details detail_type
0 0 BBPS 3 start
1 0 Findings DV start
2 0 Findings DV end
3 0 Findings DV start
4 0 Findings DV end
5 31 Actions IR start
6 99 BBPS 2 end
7 99 Findings PL start
8 99 Findings PL end
9 99 Findings PL start
10 99 Findings PL end
11 101 Action IR end
Now we can replace the 2 columns by their concatenated text:
>>> df = df.drop(columns=['m_label', 'details']).join(df['m_label'].str.cat(df['details'], sep='.'))
>>> df.drop_duplicates()
frame detail_type m_label
0 0 start BBPS.3
1 0 start Findings.DV
2 0 end Findings.DV
5 31 start Actions.IR
6 99 end BBPS.2
7 99 start Findings.PL
8 99 end Findings.PL
11 101 end Action.IR
You could even pivot to have a start and an end column:
>>> df.drop_duplicates().pivot(columns='detail_type', index='m_label', values='frame')
detail_type end start
m_label
Action.IR 101.0 NaN
Actions.IR NaN 31.0
BBPS.2 99.0 NaN
BBPS.3 NaN 0.0
Findings.DV 0.0 0.0
Findings.PL 99.0 99.0
But for that to be efficient you’ll first need to define rules that uniquely name your labels, e.g. BBPS regardless of details 2 and 3, Action / Actions always spelled the same way, etc.
Solution 2:
I don't think groupby
would help, as the order inside the group also matter.
Try this (since you didn't post the df
in a copiable way, I can't test it myself):
df = df.assign(new_label=None).sort_values(['frame', 'm_label'])
df.loc[~df['details'].isin(['start', 'end']), 'new_label'] = df['m_label'] + '.' + df['details']
df.loc[(df['frame'] == df['frame'].shift(-1).fillna('')) & (df['m_label'] == df['m_label'].shift(-1).fillna('')) & df['details'].shift(-1).isin(['start', 'end']), 'details'] = df['details'].shift(-1).fillna('')
df = df.loc[pd.notna(df['new_label']) & df['details'].isin(['start', 'end']), ['frame', 'new_label', 'details']]
Post a Comment for "Group By And Concatenate Dataframe"