Question

I am trying to extract city, state and/or zip code from a string using a regular expression. The regex I am using (from here get city, state or zip from a string in python) is ([^\d]+)?(\d{5})? and when I tested it on http://regex101.com/ it accurately selects the two strings I want to match.

However I'm not sure how to separate these two strings in Python. Here is what I have tried:

import re

string = "binghamton ny 13905"

reg = re.compile('([^\d]+)?(\d{5})?')
match = reg.match(string)

return match.group()

This simply returns the entire string. Is there a way to pull each match individually?

I have also tried separating the regular expression into two distinct regular expressions (one for city, state and one for zip code) however the zip code regex either returns an empty string or None. All help is appreciated, thanks.

Was it helpful?

Solution

Probably the easiest way is to name the two capturing groups:

reg = re.compile('(?P<city>[^\d]+)?(?P<zip>\d{5})?')

and then access the groupdict:

>>> match = reg.match("binghamton ny 13905")
>>> match.groupdict()
{'city': 'binghamton ny ', 'zip': '13905'}

This gives you easy access to the two pieces of information by name, rather than index.

OTHER TIPS

I would agree with jonrsharpe

string = "binghamton ny 13905"
reg = re.compile('(?P<city>[^\d]+)?(?P<zip>\d{5})?')
result = re.match(reg, string)

Additionally you can access the variables by name like this:

result.group('city')
result.group('zip')

Python re reference page

r = re.search("([^\d]+)?(\d{5})?")
r.groups()


(u'binghamton ny ', u'13905')
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top