Skip to content Skip to sidebar Skip to footer

Align Data In One Column With Another Row, Based On The Last Time Some Condition Was True

I’m trying to parse millions of lines of log files that suffer from an unfortunate deficiency. Data relating to a single event can be split across log entries but there is no dir

Solution 1:

IIUC, I think you can do it this way. Create two masking one representing the rows where the current Iteration value is now. And, the second mask puts True on the first record where you want the Iteration value to move too. Then group on the first mask with cumsum and put that current value on all records, then use the second mask with where.

mask=(df['thing_I_care_about'].isnull() &
      df['A'].isnull() &
      df['B'].isnull() &
      df['C'].isnull())

fmask  = (df['thing_I_care_about'].notnull() &
      df['A'].notnull() &
      df['B'].notnull() &
      df['C'].notnull())

df.assign(Iterations=df.groupby(mask[::-1].cumsum())['Iterations'].transform(lambda x: x.iloc[-1]).where(fmask))

Output:

  thing_I_care_about  thread_num    A    B    C  Iterations
0            thing_1           2    X    X    X       110.01NaN2    X    X  NaNNaN2            thing_2           3NaN    X    X         NaN3NaN2NaNNaNNaNNaN4            thing_3           7    X    X    X       150.05            thing_4           5    X    X  NaNNaN6NaN7NaNNaNNaNNaN

Post a Comment for "Align Data In One Column With Another Row, Based On The Last Time Some Condition Was True"