Find and print text in quotation marks from a text file with python

Question 1

As Bakuriu has pointed out, you need to add .read() like so:

quotes = re.findall(ur'[^\u201d]*[\u201d]', input.read())

open() merely returns a file object, whereas f.read() will return a string. In addition, I'm guessing you are looking to get everything between two quotation marks instead of just zero or more occurences of [\^u201d] before a quotation mark. So I would try this:

quotes = re.findall(ur'[\u201d][^\u201d]*[\u201d]', input.read(), re.U)

The re.U accounts for unicode. Or (if you don't have two sets of right double quotation marks and don't need unicode):

quotes = re.findall(r'"[^"]*"', input.read(), re.U)

Finally, you may want to choose a different variable than input, since input is a keyword in python.

Your result might look something like this:

>>> input2 = """
cfrhubecf "ehukl wehunkl echnk
wehukb ewni; wejio;"
"werulih"
"""
>>> quotes = re.findall(r'"[^"]*"', input2, re.U)
>>> print quotes
['"ehukl wehunkl echnk\nwehukb ewni; wejio;"', '"werulih"']

Question 2

Instead of using regular expressions you could try some python builtins. Ill let you do the hard work:

message = '''
"some text in quotes", some text not in quotes. Some more text 'In different kinds of quotes'.
'''
list_of_single_quote_items = message.split("'")
list_of_double_quote_items = message.split(""")

The challenging part will be interpreting what your split list means and dealing with all edge conditions (only one quote in string, escape sequences, etc.)