Lets say I have two list of words one that follows the other. They are connected by a space or dash. To make it simple they will be the same words:
First=['Derp','Foo','Bar','Python','Monte','Snake']
Second=['Derp','Foo','Bar','Python','Monte','Snake']
So the following combinations of the following words exist(indicated by yes):
Derp Foo Bar Python Monte Snake
Derp No No Yes Yes Yes Yes
Foo Yes No No Yes Yes Yes
Bar Yes Yes No Yes Yes Yes
Python No Yes Yes No Yes Yes
Monte No Yes Yes No No No
Snake Yes No Yes Yes Yes No
I have a data set like this which I am detecting particular words:
df=pd.DataFrame({'Name': [ 'Al Gore', 'Foo-Bar', 'Monte-Python', 'Python Snake', 'Python Anaconda', 'Python-Pandas', 'Derp Bar', 'Derp Python', 'JavaScript', 'Python Monte'],
'Class': ['Politician','L','H','L','L','H', 'H','L','L','Circus']})
If I use Regex and mark all the data that is from the pattern it would look something like this:
import pandas as pd
df=pd.DataFrame({'Name': [ 'Al Gore', 'Foo-Bar', 'Monte-Python', 'Python Snake', 'Python Anaconda', 'Python-Pandas', 'Derp Bar', 'Derp Python', 'JavaScript', 'Python Monte'],
'Class': ['Politician','L','H','L','L','H', 'H','L','L','Circus']})
df['status']=''
patterns=['^Derp(-|\s)(Foo|Bar|Snake)$', '^Foo(-|\s)(Bar|Python|Monte)$', '^Python(-|\s)(Derp|Foo|Bar|Snake)', '^Monte(-|\s)(Derp|Foo|Bar|Python|Snake)$']
for i in range(len(patterns)):
df.loc[df.Name.str.contains(patterns[i]),'status'] = 'Found'
print (df)
Here is the print:
>>>
Class Name status
0 Politician Al Gore
1 L Foo-Bar Found
2 H Monte-Python Found
3 L Python Snake Found
4 L Python Anaconda
5 H Python-Pandas
6 H Derp Bar Found
7 L Derp Python
8 L JavaScript
9 Circus Python Monte
[10 rows x 3 columns]
For larger datasets it does not seem very feasible to write out all the Regex patterns. So is there a way to make a loop or something to go through patterns from a matrix of combinations to retrieve patterns that exist (indicated as yes in table above) and skip the ones that do not (indicated as no in table above)? I know that in the itertools
library there is a function called combinations
that can go through and generate all the possible patterns via looping.