Question

I am fairly new to python and programming as a whole. Just about learning my ABCs. Let's say, I have a string like this.

s = "DEALER:'S up, Bubbless? BUBBLES: Hey. DEALER: Well, there you go. JUNKIE: Well, what you got?DEALER: I got some starters.";

I want the string to end when a word with a uppercase and a colon(:) at the end is encountered. And then a new string is created that stores the other string. For the string above, I will get

 s1 = "DEALER:'S up, Bubbless?"
    s2 = "BUBBLES: Hey."
    s3 = "DEALER: Well, there you go."

This is my regex code for doing it

mystring = """
DEALER: 'S up, Bubbless?
BUBBLES: Hey.
DEALER: Well, there you go.
JUNKIE: Well, what you got?
DEALER: I got some starters. """

#[A-Z]+:.*?(?=[A-Z]+:|$)

#p = re.compile('([A-Z]*):')
p = re.compile('[A-Z]+:.*?(?=[A-Z]+:|$)')
s = set(p.findall(mystring))

How would i loop through it to get each string? It only gets the first string (ie DEALER: 'S up, Bubbless?) and stops. Sorry if i sound a bit clueless. Kinda new to programming. Learning with practice as I go along

Was it helpful?

Solution

Since it is a multiline string, you need to use re.DOTALL option, like this

p = re.compile('[A-Z]+:.*?(?=[A-Z]+:|$)', re.DOTALL)

Output

set(["DEALER: 'S up, Bubbless?\n",
     'JUNKIE: Well, what you got?\n',
     'DEALER: Well, there you go.\n',
     'DEALER: I got some starters. ',
     'BUBBLES: Hey.\n'])

Quoting from re.DOTALL docs,

Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.

So, without that option, .*? doesn't match \n. That's why none of them other strings got matched.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top