Question

Hey, folks. I'm looking for some regular expressions to help grab street addresses and phone numbers from free-form text (a la Gmail).

Given some text: "John, I went to the store today, and it was awesome! Did you hear that they moved to 500 Green St.? ... Give me a call at +14252425424 when you get a chance."

I'd like to be able to pull out:

500 Green St. (recognized as a street address)

+14252425424 (recognized as a phone number)

What makes this problem easier is that I don't care about parsing text that gets pulled out. That is, I don't care that Green is the name of the road or that 425 is the area code. I just want to grab strings that "look like" addresses or telephone numbers.

Unfortunately, this needs to work internationally, as best as possible.

Anyone have any leads? Thanks!

Was it helpful?

Solution

Phone numbers as long as you have a list of all country codes and number formats is easy, street addresses I have no idea, the only advice I can give you is to validate each set of words @ addressdoctor.com

OTHER TIPS

You can give RecogniContact (-> address-parser.com) a try, it recognizes both postal addresses and phone numbers.

Take a look at Chapter 7 of Dive Into Python. It touches both phone numbers and street addresses. I believe you can use this as a starting point. The international part seems tough. I suggest you build a first draft, try it on several locales, iterate and improve.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top