Question

I can't seem to get my regex right when trying to catch the phrase in between the quotations. E.g. in bold (NOTE: that the input is has strings before and after):

"I can quite understand your thinking so." I said. "Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"

"Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"--I picked up the morning paper from the ground--"let us put it to a practical test. Here is the first heading upon which I come. 'A husband's cruelty to his wife.' There is half a column of print, but I know without reading it that it is all perfectly familiar to me. There is, of course, the other woman, the drink, the push, the blow, the bruise, the sympathetic sister or landlady. The crudest of writers could invent nothing more crude."

I've tried to get the the text before and after the quotes but i can't get to the desired output. There must be some way to group the regex up such that i can catch the string in between the quotes and also the surrounding two quotes.

Tried:

import re

def get_quotes(paragraph):
    quote_rx = r'''([""])(?:(?=(\\?))\2.)*?\1'''
    return [i.group(0) for i in \
           re.finditer(quote_rx, paragraph, re.S)]

def get_said(paragraph, quote):
    quote_start = paragraph.index(quote)
    quote_end = quote_start + len(quote)
    before = paragraph[:quote_start]
    after = paragraph[quote_end:]
    return before, after


paragraphs = ['''I smiled and shook my head. "I can quite understand your thinking so." I said. "Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"--I picked up the morning paper from the ground--"let us put it to a practical test. Here is the first heading upon which I come. 'A husband's cruelty to his wife.' There is half a column of print, but I know without reading it that it is all perfectly familiar to me. There is, of course, the other woman, the drink, the push, the blow, the bruise, the sympathetic sister or landlady. The crudest of writers could invent nothing more crude."''', 
'''Such was the remarkable narrative to which I listened on that April evening -- a narrative which would have been utterly incredible to me had it not been confirmed by the actual sight of the tall, spare figure and the keen, eager face, which I had never thought to see again. In some manner he had learned of my own sad bereavement, and his sympathy was shown in his manner rather than in his words. "Work is the best antidote to sorrow, my dear Watson," said he, "and I have a piece of work for us both to-night which, if we can bring it to a successful conclusion, will in itself justify a man's life on this planet." In vain I begged him to tell me more. "You will hear and see enough before morning," he answered. "We have three years of the past to discuss. Let that suffice until half-past nine, when we start upon the notable adventure of the empty house."''']

for p in paragraphs:
    saids = set()
    for i in get_quotes(p):
        b,a = get_said(p,i)
        print b
        print a
        print

Desired output:

in-btw: I said.
quotes: ["I can quite understand your thinking so.","Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"]
section: "I can quite understand your thinking so." **I said.** "Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"


in-btw: --I picked up the morning paper from the ground--
quotes: ['''"Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"''', '''"let us put it to a practical test. Here is the first heading upon which I come. 'A husband's cruelty to his wife.' There is half a column of print, but I know without reading it that it is all perfectly familiar to me. There is, of course, the other woman, the drink, the push, the blow, the bruise, the sympathetic sister or landlady. The crudest of writers could invent nothing more crude."''']
section: "Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"**--I picked up the morning paper from the ground--**"let us put it to a practical test. Here is the first heading upon which I come. 'A husband's cruelty to his wife.' There is half a column of print, but I know without reading it that it is all perfectly familiar to me. There is, of course, the other woman, the drink, the push, the blow, the bruise, the sympathetic sister or landlady. The crudest of writers could invent nothing more crude."
Was it helpful?

Solution

It's very simple, the regular expression you need is r'^("[^"]+")([^"]+)("[^"]+")':

import re

s = """
"I can quite understand your thinking so." I said. "Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"

"Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"--I picked up the morning paper from the ground--"let us put it to a practical test. Here is the first heading upon which I come. 'A husband's cruelty to his wife.' There is half a column of print, but I know without reading it that it is all perfectly familiar to me. There is, of course, the other woman, the drink, the push, the blow, the bruise, the sympathetic sister or landlady. The crudest of writers could invent nothing more crude."
"""

for segment in s.splitlines():
    if not segment:
        continue
    first, said, second = re.match(r'^("[^"]+")([^"]+)("[^"]+")', segment).groups()
    print first
    print said
    print second

>>> 
"I can quite understand your thinking so."
 I said. 
"Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"
"Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"
--I picked up the morning paper from the ground--
"let us put it to a practical test. Here is the first heading upon which I come. 'A husband's cruelty to his wife.' There is half a column of print, but I know without reading it that it is all perfectly familiar to me. There is, of course, the other woman, the drink, the push, the blow, the bruise, the sympathetic sister or landlady. The crudest of writers could invent nothing more crude."
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top