Skip to content Skip to sidebar Skip to footer

Python Pandas Conditional Replace String Based On Column Values

Given these data frames...: DF = pd.DataFrame({'COL1': ['A', 'B', 'C', 'D','D','D'], 'COL2': [11032, 1960, 11400, 11355, 8, 7], 'year': ['20

Solution 1:

This looks like you want to updateDF with data from DF2.

Assuming that all values in DF2 are unique for a given pair of values in ColX and ColY:

DF = DF.merge(DF2.set_index(['ColX', 'ColY'])[['ColZ']], 
              how='left', 
              left_on=['COL1', 'year'], 
              right_index=True)
DF.COL2.update(DF.ColZ)
del DF['ColZ']

>>> DF
  COL1   COL2  year
0    A  1103220161    B   196020172    C  1140020183    D  1135520194    D      820205    D    1002021

I merge a temporary dataframe (DF2.set_index(['ColX', 'ColY'])[['ColZ']]) into DF, which adds all the values from ColZ where its index (ColX and ColY) match the values from COL1 and year in DF. All non-matching values are filled with NA.

I then use update to overwrite the values in DF.COL2 from the non-null values in DF.ColZ.

I then delete DF['ColZ'] to clean-up.

If ColZ matches an existing column name in DF, then you would need to make some adjustments.

An alternative solution is as follows:

DF = DF.set_index(['COL1', 'year']).update(DF2.set_index(['ColX', 'ColY']))
DF.reset_index(inplace=True)

The output is identical to that above.

Post a Comment for "Python Pandas Conditional Replace String Based On Column Values"