Domanda

I have the following text:

LAST_NAME_1, Firs_name_1    Home Phone: 333-336-6514
192 generic St. 
Newton MA 02471
Status: Attender    Marital:    Married Adult:  M/F:    Env.No.:


LAST_NAME_2, Firs_name_2    Home Phone: 777-777-2205    Cell Phone: 888-888-8888
10 generic St. 
Newton MA 02471

    E-mail :    email@gmail.com
Status: Member  Marital:    Married Adult:  Y   M/F:    M   Env.No.:    5

I need to obtain the text after the phone numbers, but they can have Home phone, Cell Phone, Emergency Phone, Fax or work phone in different orders. is there any regular expression that can give me the text after the last phone number?, I mean in the second blockof text get the text after Cell Phone: 888-888-888?

È stato utile?

Soluzione

In [1]: import re

In [2]: s=""" LAST_NAME_1, Firs_name_1    Home Phone: 333-336-6514
Status: Member  Marital:    Married Adult:  Y   M/F:    M   Env.No.:    5"""   ...: 192 generic St.
   ...: Newton MA 02471
   ...: Status: Attender    Marital:    Married Adult:  M/F:    Env.No.:
   ...:
   ...:
   ...: LAST_NAME_2, Firs_name_2    Home Phone: 777-777-2205    Cell Phone: 888-888-8888
   ...: 10 generic St.
   ...: Newton MA 02471
   ...:
   ...:     E-mail :    email@gmail.com
   ...: Status: Member  Marital:    Married Adult:  Y   M/F:    M   Env.No.:    5"""

In [3]:

In [4]: re.findall('[0-9]{3}-[0-9]{3}-[0-9]{4}\n(.*)', s, re.MULTILINE)
Out[4]: ['192 generic St. ', '10 generic St. ']

NODE         EXPLANATION
-----------------------------------------------------
  [0-9]{3}     any character of: '0' to '9' (3 times)
-----------------------------------------------------
  -            '-'
-----------------------------------------------------
  [0-9]{3}     any character of: '0' to '9' (3 times)
-----------------------------------------------------
  -            '-'
-----------------------------------------------------
  [0-9]{4}     any character of: '0' to '9' (4 times)
-----------------------------------------------------
  \n           '\n' (newline)
-----------------------------------------------------
  (            group and capture to \1:
-----------------------------------------------------
    .*           any character except \n (0 or more times
                 (matching the most amount possible))
------------------------------------------------------
  )            end of \1

Altri suggerimenti

Is this what you want?

doc = '''LAST_NAME_1, Firs_name_1    Home Phone: 333-336-6514
192 generic St. 
Newton MA 02471
Status: Attender    Marital:    Married Adult:  M/F:    Env.No.:


LAST_NAME_2, Firs_name_2    Home Phone: 777-777-2205    Cell Phone: 888-888-8888
10 generic St. 
Newton MA 02471

    E-mail :    email@gmail.com
Status: Member  Marital:    Married Adult:  Y   M/F:    M   Env.No.:    5'''

import re

p = re.compile(r'[0-9]{3}-[0-9]{3}-[0-9]{4}\n(.*)')

for x in p.finditer(doc):
    print x.group(1)

The output is

192 generic St. 
10 generic St. 

Explanation

[0-9]{3}-[0-9]{3}-[0-9]{4}\n(.*)
__________________________          <- phone number
                          __        <- newline
                             __     <- this part is group(1)
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top