Question

I have the following text:

LAST_NAME_1, Firs_name_1    Home Phone: 333-336-6514
192 generic St. 
Newton MA 02471
Status: Attender    Marital:    Married Adult:  M/F:    Env.No.:


LAST_NAME_2, Firs_name_2    Home Phone: 777-777-2205    Cell Phone: 888-888-8888
10 generic St. 
Newton MA 02471

    E-mail :    email@gmail.com
Status: Member  Marital:    Married Adult:  Y   M/F:    M   Env.No.:    5

I need to obtain the text after the phone numbers, but they can have Home phone, Cell Phone, Emergency Phone, Fax or work phone in different orders. is there any regular expression that can give me the text after the last phone number?, I mean in the second blockof text get the text after Cell Phone: 888-888-888?

Was it helpful?

Solution

In [1]: import re

In [2]: s=""" LAST_NAME_1, Firs_name_1    Home Phone: 333-336-6514
Status: Member  Marital:    Married Adult:  Y   M/F:    M   Env.No.:    5"""   ...: 192 generic St.
   ...: Newton MA 02471
   ...: Status: Attender    Marital:    Married Adult:  M/F:    Env.No.:
   ...:
   ...:
   ...: LAST_NAME_2, Firs_name_2    Home Phone: 777-777-2205    Cell Phone: 888-888-8888
   ...: 10 generic St.
   ...: Newton MA 02471
   ...:
   ...:     E-mail :    email@gmail.com
   ...: Status: Member  Marital:    Married Adult:  Y   M/F:    M   Env.No.:    5"""

In [3]:

In [4]: re.findall('[0-9]{3}-[0-9]{3}-[0-9]{4}\n(.*)', s, re.MULTILINE)
Out[4]: ['192 generic St. ', '10 generic St. ']

NODE         EXPLANATION
-----------------------------------------------------
  [0-9]{3}     any character of: '0' to '9' (3 times)
-----------------------------------------------------
  -            '-'
-----------------------------------------------------
  [0-9]{3}     any character of: '0' to '9' (3 times)
-----------------------------------------------------
  -            '-'
-----------------------------------------------------
  [0-9]{4}     any character of: '0' to '9' (4 times)
-----------------------------------------------------
  \n           '\n' (newline)
-----------------------------------------------------
  (            group and capture to \1:
-----------------------------------------------------
    .*           any character except \n (0 or more times
                 (matching the most amount possible))
------------------------------------------------------
  )            end of \1

OTHER TIPS

Is this what you want?

doc = '''LAST_NAME_1, Firs_name_1    Home Phone: 333-336-6514
192 generic St. 
Newton MA 02471
Status: Attender    Marital:    Married Adult:  M/F:    Env.No.:


LAST_NAME_2, Firs_name_2    Home Phone: 777-777-2205    Cell Phone: 888-888-8888
10 generic St. 
Newton MA 02471

    E-mail :    email@gmail.com
Status: Member  Marital:    Married Adult:  Y   M/F:    M   Env.No.:    5'''

import re

p = re.compile(r'[0-9]{3}-[0-9]{3}-[0-9]{4}\n(.*)')

for x in p.finditer(doc):
    print x.group(1)

The output is

192 generic St. 
10 generic St. 

Explanation

[0-9]{3}-[0-9]{3}-[0-9]{4}\n(.*)
__________________________          <- phone number
                          __        <- newline
                             __     <- this part is group(1)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top