Skip to content Skip to sidebar Skip to footer

Case Insensitive Pandas Dataframe.merge

I am struggling with the easiest way to do a case insensitive merge in pandas. Is there a way to do it right on the merge? Do I need to use (?i) or a regex with ignorecase? In my c

Solution 1:

Lowercase the values in the two columns that will be used to merge, and then merge on the lowercased columns

df_address['country_lower'] = df_address['Country'].str.lower()
df_CountryMapping['name_lower'] = df_CountryMapping['NAME'].str.lower()
df_merged = df_address.merge(df_CountryMapping, left_on="country_lower", right_on="name_lower", how="left")

Solution 2:

df_merged = pd.merge(df_address, df_CountryMapping, left_on=df_address["Country"].str.lower(), right_on=df_CountryMapping["NAME"].str.lower(), how="left")

Solution 3:

I suggest lowering the column names after reading them

df_address.columns=[c.lower() for c in df_address.columns]
df_CountryMapping.columns=[c.lower() for c in df_CountryMapping.columns]

Then update the values

df_address['country']=df_address['country'].str.lower()
df_CountryMapping['name']=df_CountryMapping['name'].str.lower()

And only then, do the merging

df_merged = df_address.merge(df_CountryMapping, left_on="country", right_on="name", how="left")

Solution 4:

One solution would be to convert the column names of both data frames to be all lowercase. So something like this:

df_address = pd.read_csv(env_path + "\\address.csv")
df_CountryMapping = pd.read_csv(map_path + "\CountryMapping.csv")

df_address.rename(columns=lambda x: x.lower(), inplace=True)
df_CountryMapping.rename(columns=lambda x: x.lower(), inplace=True)

df_merged = df_address.merge(df_CountryMapping, left_on="country", right_on="name", how="left")

Solution 5:

Another option is with ".str.casefold()" for a more comprehensive incorporation of ASCII and different language characters. If your just using English alpha chars it should be the same as ".str.lower()"

df_address['country_casefolded'] = df_address['Country'].str.casefold()
df_CountryMapping['name_casefolded'] = df_CountryMapping['NAME'].str.casefold()
df_merged = df_address.merge(df_CountryMapping, left_on="country_casefolded", right_on="name_casefolded", how="left")

Post a Comment for "Case Insensitive Pandas Dataframe.merge"