Domanda

I am looking for some guidance for how I could check the pasteboard in iOS for a valid mailing address.

If someone pastes

1234 Apple Street
New York, NY 10011

It parses each part of the string to fill in Address, City, State and Zip. It could be any address and It would be ideal if it could be found inside a longer string.

For example

Meet me at 1234 Apple Street New York, NY 10011 See you there!

Still will parse the correct Address, City, State and Zip.

Any help would be much appreciated!

-Wes

È stato utile?

Soluzione

I was a developer at SmartyStreets. We were kind of crazy about street addresses, and street addresses drove me crazy (especially parsing them). It's a two-way street. (Am I done with the street puns?)

First, let's talk about the case where the address is all by itself, because that's easier, albeit still difficult...

Please reference this other question and answer about the very same thing. I also strongly encourage you follow the links to related questions in both the question and the answer. Parsing addresses is a can of worms, but it's not impossible. It's just really hard to do it reliably.

Notice in the answer to that question how many different formats valid addresses can appear in. What guarantees do you have that the user will type it in any of those? And that's only a few. There are others. Consider military, PO box, rural route, and other "special" addresses that don't adhere to the typical format. What about addresses that have a two-or-three-word city name? What about addresses that use a grid system like 100 N 500 E, or secondary numbers like suite, apartment, floor, etc? What about addresses with "1/2", hyphens (as a required punctuation), etc? Addresses missing zip codes or city/state?

All of these and more could be valid. And that's only for US addresses.

If all your addresses, or even most of them (which isn't the case), came in the form like you proposed above, as an example:

[Primary Number] [Street Name] [Any of these street suffixes]

[City Name Followed by a Comma], [State Abbreviation] [5-digit ZIP code]

Then this would be quite easy. Wouldn't that be nice?

You could try to write a regular expression like this guy or that guy, but that only works if addresses are a regular language. They're not regular, and regular expressions are not the answer.

There are a few services which can do this for you because they have a master list (kind of), and the software has to meet rigorous certification standards.

Obviously, since I work at SmartyStreets, I'm prone to suggest starting your search for an answer there. You can try some freeform addresses on the homepage (just fill out the "Street" field). But be aware of a few things that will probably always be an issue. LiveAddress API will be able to parse street addresses for you, most of the time. Shop around, but this should give you an idea.

Now your second question: extract a street address from a string of text. This has been extensively covered elsewhere on S.O. and the interwebs, so I won't go into a lot of detail. Basically, to do this reliably, you'll probably need some Natural Language Processing and human interaction to confirm or correct the best guess.

Don't ever assume these things about un-standardized addresses:

  • Starts with a number
  • Ends with a number
  • Everything between the two numbers is an address
  • Has a ZIP code present
  • No more than 2 numbers will be in an address
  • It's unambiguous
  • It exists
  • A street suffix will always be present
  • It's spelled correctly
  • ...etc.

Again, refer to some other linked posts about this issue. You can make guesses, but always always always have a human confirm the guess if you do that. (Some Mac apps do this. If they detect an address, it will get highlighted, and you can add that address to your contacts. Unfortunately I've seen false positives a lot, and it also misses them a lot.)

Good luck!

Altri suggerimenti

I also work at SmartyStreets, and since I'm not a developer I'm not bound by any constraints such as "it can't be done" or "there's no way to do it reliably". In fact the ideas that I come up with may not even always be possible, but, I'm a problem-solver, a solution-finder, and this particular problem absolutely has a solution.

You'll need the following: a little regex, knowledge of a scripting language (python, php, whatever you prefer) and access to an address validation tool (this is required so that you know when you get it right).

So, let's start with the example sentence:

Meet me at 1234 Apple Street New York, NY 10011 See you there!

We can be sure that every address has a beginning and an end. (you can take that to the bank!)

So, if you run a regular expression that looks for the beginning of the address within the string you can eliminate everything before the address begins. Here's a regex that will do just that:

(^(.*(?=p\.?o\.? box|h\.?c\.?r\.? |c\.?m\.?r\.?)|^[^0-9]+))

This will give you back the following:

1234 Apple Street New York, NY 10011 See you there!

Now, you're halfway there but you'll need to loop through the remaining string. Another assumption that you can certainly make is that an address will never be longer than 328 charachters long (I made up that number, but you get the picture. An address has to have an end as well and you can shorten the string by determining the max acceptable USPS address length.)

You're going to loop through the address string until you get a valid address out of it. To do this, start at the beginning and move one word to the right with each additional permutation. This is where the address validation service come in handy, because you have no idea where the address ends and that's what you need to know. So, each permutation you generate from the string (remember, you're starting from the left side) will be sent for validation. Since no valid address can have fewer than two words, You'll start there. Here are the permutations from the example address as well as the validation results (I'm trying each address by entering it in the address line of the address search box on smartystreets.com:

1234 Apple ==> fail

1234 Apple Street ==> fail

1234 Apple Street New ==> fail

1234 Apple Street New York ==> fail

1234 Apple Street New York, NY ==> Bingo, valid address match. No need to keep iterating.

Now, obviously this is not a valid address but you can try the same thing with a real address and you'll get the same results. Obviously this isn't the most sophisticated method to extract a valid address from a string but it certainly works. And, since SmartyStreets allows you to send up to 100 addresses per query, you could permute the address string up to 99 times and get the results back in under 300ms. This won't work with every address, as you'll certainly find out, but it can very easily handle a large majority of them, regardless of how obscured the address is within the text string.

So, we started with this meet me at 1234 Apple Street New York, NY 10011 See you there! and within less than half a second came up with this 1234 Apple Street New York, NY 10011-1000.

Pretty cool huh? It even sounds really easy coming from a non-programmer.


Let's try it with a real address:

Meet me at 4219 jon young orlando fl 32839 See you there!

Apply regex and you get:

4219 jon young orlando fl 32839 See you there!

Permute, iterate, validate:

4219 jon ==> fail

4219 jon young ==> fail

4219 jon young orlando ==> fail

4219 jon young orlando fl ==> Bingo, valid address match.

Address entry field Resulting address data

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top