Skip to content Skip to sidebar Skip to footer

Create New Pandas Dataframe Column Containing Boolean Output From Searching For Substrings

I'd like to create a new column where if a substring is found in an existing column, it will return True and vice versa. So in this example, I'd like to search for the substring '

Solution 1:

This how to do it.

df["b"] = df["a"].str.contains("abc")

Regarding your error.

It's seems that you have np.nan value in your column a, then the method str.contain will return np.nan for those value, as you try to index with an array containing np.nan value, pandas tell you that is not possible.

Solution 2:

Not the best solution but you can check for null values with pd.isnull() or convert null values to a string with str().

df = pd.DataFrame({'a':['zabc', None, 'abcy', 'defg']})


df['a'].map(lambda x: Trueif'abc'instr(x) elseFalse)

or

df['a'].map(lambda x: Falseif pd.isnull(x) or'abc'notin x elseTrue)

Reuslt:

0True1False2True3FalseName:a,dtype:bool

Solution 3:

Your first code is ok, here is the output on my sample.

s = pd.Series(['cat','hat','dog','fog','pet'])
d = pd.DataFrame(s, columns=['test'])
d['b'] = d['test'].map(lambda x: Trueif'og'in x elseFalse)
d

enter image description here

Post a Comment for "Create New Pandas Dataframe Column Containing Boolean Output From Searching For Substrings"