Question

When i run the scirpt below, i get no output at all. What i really want to do is: Create a string from an iterable and then use this string as an argument to re.findall. Print(tab), gives a-z0-9.

import re   

 my_tab = ['a-z',
            '0-9']

    tab = ''.join(my_tab)
    line = 'and- then 3 times minus 456: no m0re!'

    re.findall('tab', 'line')

What am i missing here? Is this the most pythonic way to achieve this??

Was it helpful?

Solution 2

You have done 'tab' and not tab. One is a string, another is a variable. You want to do re.findall(tab, line) (see how tab is no longer a string). You also did this for line.

However, if you print tab beforehand, you'll notice you have:

a-z0-9

When I think you're intending to have

[a-z0-9]

So you can concatenate strings:

>>> print re.findall('['+tab+']',line) # Here we add a bracket to each side
                                       # of a-z0-9 to create a valid regex 
                                       # capture group [a-z0-9]
['a', 'n', 'd', 't', 'h', 'e', 'n', '3', 't', 'i', 'm', 'e', 's', 'm', 'i', 'n', 'u', 's', '4', '5', '6', 'n', 'o', 'm', '0', 'r', 'e']

Or you can use str.format():

>>> print re.findall('[{}]'.format(tab),line)
['a', 'n', 'd', 't', 'h', 'e', 'n', '3', 't', 'i', 'm', 'e', 's', 'm', 'i', 'n', 'u', 's', '4', '5', '6', 'n', 'o', 'm', '0', 'r', 'e']

OTHER TIPS

This will not work, you are telling the regular expression to search for the string 'tab' in the string 'line' .

Even if you did not make that mistake. And did indeed search using the string 'a-z 0-9' which you named tab with the string 'and- then 3 times minus 456: no m0re!' which you named line you would find nothing, this is because 'a-z 0-9' is not valid as regular expression capture group, and will result in no matches in this case.

If you wanted to find any instance of a lower-case letter (a-z) or a number (0-9) you could use this:

>>> re.findall('([a-z\d])', 'and- then 3 times minus 456: no m0re!')
['a', 'n', 'd', 't', 'h', 'e', 'n', '3', 't', 'i', 'm', 'e', 's', 'm', 'i', 'n', 'u', 's', '4', '5', '6', 'n', 'o', 'm', '0', 'r', 'e']

But I do not see how this helps you? Maybe you could explain what you are trying to do.. Either way, I suggest you read about regular expression to learn more.

re.findall(tab, line)

You have used two strings not variables. And actually I think what you want is re.findall('[a-z0-9]', line). But for this goal, you could just use list comprehension [x for x in list(line) if x != ' '].

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top