How to use \b word boundary in pandas str.contains?

https://stackoverflow.com/questions/22359962

13-06-2023
|

문제

Is there an equivalent when using str.contains?

the following code is mistakenly listing "Said Business School" in the category because of 'Sa.' If I could create a wordboundary it would solve the problem. Putting a space after messes this up. I am using pandas, which are the dfs. I know I can use regex, but just curious if i can use strings to make it faster

gprivate_n = ('Co|Inc|Llc|Group|Ltd|Corp|Plc|Sa |Insurance|Ag|As|Media|&|Corporation')
df.loc[df[df.Name.str.contains('{0}'.format(gprivate_n))].index, "Private"] = 1

해결책 2

A word boundary is not a character, so you can't find it with .contains. You need to either use regex or split the strings into words and then check for membership of each of those words in the set you currently have defined in gprivate_n.

다른 팁

This is just the same old Python issue in regexes where '\b' should be passed either as raw-string r'\b...'. Or less desirably, double-escaping ('\\b').

So your regex should be:

gprivate_n = (r'\b(Co|Inc|Llc|Group|Ltd|Corp|Plc|Sa |Insurance|Ag|As|Media|&|Corporation)')

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow