As Bakuriu has pointed out, you need to add .read()
like so:
quotes = re.findall(ur'[^\u201d]*[\u201d]', input.read())
open()
merely returns a file object, whereas f.read()
will return a string. In addition, I'm guessing you are looking to get everything between two quotation marks instead of just zero or more occurences of [\^u201d]
before a quotation mark. So I would try this:
quotes = re.findall(ur'[\u201d][^\u201d]*[\u201d]', input.read(), re.U)
The re.U
accounts for unicode. Or (if you don't have two sets of right double quotation marks and don't need unicode):
quotes = re.findall(r'"[^"]*"', input.read(), re.U)
Finally, you may want to choose a different variable than input
, since input
is a keyword in python.
Your result might look something like this:
>>> input2 = """
cfrhubecf "ehukl wehunkl echnk
wehukb ewni; wejio;"
"werulih"
"""
>>> quotes = re.findall(r'"[^"]*"', input2, re.U)
>>> print quotes
['"ehukl wehunkl echnk\nwehukb ewni; wejio;"', '"werulih"']