Case Insensitive Pandas Dataframe.merge
I am struggling with the easiest way to do a case insensitive merge in pandas. Is there a way to do it right on the merge? Do I need to use (?i) or a regex with ignorecase? In my c
Solution 1:
Lowercase the values in the two columns that will be used to merge, and then merge on the lowercased columns
df_address['country_lower'] = df_address['Country'].str.lower()
df_CountryMapping['name_lower'] = df_CountryMapping['NAME'].str.lower()
df_merged = df_address.merge(df_CountryMapping, left_on="country_lower", right_on="name_lower", how="left")
Solution 2:
df_merged = pd.merge(df_address, df_CountryMapping, left_on=df_address["Country"].str.lower(), right_on=df_CountryMapping["NAME"].str.lower(), how="left")
Solution 3:
I suggest lowering the column names after reading them
df_address.columns=[c.lower() for c in df_address.columns]
df_CountryMapping.columns=[c.lower() for c in df_CountryMapping.columns]
Then update the values
df_address['country']=df_address['country'].str.lower()
df_CountryMapping['name']=df_CountryMapping['name'].str.lower()
And only then, do the merging
df_merged = df_address.merge(df_CountryMapping, left_on="country", right_on="name", how="left")
Solution 4:
One solution would be to convert the column names of both data frames to be all lowercase. So something like this:
df_address = pd.read_csv(env_path + "\\address.csv")
df_CountryMapping = pd.read_csv(map_path + "\CountryMapping.csv")
df_address.rename(columns=lambda x: x.lower(), inplace=True)
df_CountryMapping.rename(columns=lambda x: x.lower(), inplace=True)
df_merged = df_address.merge(df_CountryMapping, left_on="country", right_on="name", how="left")
Solution 5:
Another option is with ".str.casefold()" for a more comprehensive incorporation of ASCII and different language characters. If your just using English alpha chars it should be the same as ".str.lower()"
df_address['country_casefolded'] = df_address['Country'].str.casefold()
df_CountryMapping['name_casefolded'] = df_CountryMapping['NAME'].str.casefold()
df_merged = df_address.merge(df_CountryMapping, left_on="country_casefolded", right_on="name_casefolded", how="left")
Post a Comment for "Case Insensitive Pandas Dataframe.merge"