Is there a benefit to storing street address data distinctly instead of just as a string?

https://stackoverflow.com/questions/1626432

06-07-2019
|

Question

Currently we store our address data like so:

string suiteNumber (ie. unit number)
string streetNumber (building number)
string streetName
string streetDirection (N/NW/S/etc.)
string streetType    (rd/st/ave/etc.)
// ... etc. (postal code/city/province/state/country

But I'm running into the (common from what I can tell) problem of parsing out the first 5 address parts when dealing with and importing addresses.

I'm thinking that all of this would be significantly easier if the street address were just a string (varchar in the db).

There are 2 arguments that I've been given for why we should keep it as is: 1. Searching is easier when you can search against JUST the street name or number etc. but I'm thinking that a sql script along the lines of SELECT x FROM Address WHERE streetAddress LIKE "%INPUT%"; Sure it's not as fast, but it would work (and the dataset for that search is only on customers is incredibly smaller than the set of all addresses we have stored).

Currently we have a system that flags apartments - if you find that 1 person at address A is an apartment, we flag them, and it will search for all other people at that streetnumber/streetname and flag them as well (this is a sometimes important business need)

I already store them all as strings because of the myriad exceptions in addresses.

So I ask, are there particular reasons for needing/wanting to store the street address parts separately?

Solution

I wrote a whole blog post about this a while back. There are very good reasons to store each piece of data in a separate field. Not least for validation of address data.

Of course, it depends what industry you're in and what the information is being used for. If invalid address data isn't costing your company anything, then by all means store invalid data. Be aware though that down the road you may wish to use this data for mailings, demographic reports etc. If the data is invalid, it's not trivial to fix it after the fact.

Here's my blog post:

http://www.endswithsaurus.com/2009/07/lesson-in-address-storage.html

Also, in reference to searching "Where StreetAddress Like '%whatever%'". This is all well and good if you're doing a quick search for your own benefit, but when you come to attempt to automate parts of your system that rely on address data or even attempt to drop duplicates, provide users with auto-suggest etc etc, performance is degraded to a point that it will become unusable the larger the address table.

If invalid addresses aren't a worry that is going to cost the company real cash, then it's not an issue - but then, if you're not using the addresses for anything that is beneficial financially (or likely to be in the future), then why are you storing that information in the first place?

@Snorfus Ah, you must be in the Prairies. I had overlooked including posting about land descriptions in my blog post but it's something I'm considering for a later post.

Legal subdivisions (LSDs) are used primarly in Oil & Gas and other primary resource industries in Alberta, Saskatchewan and Manitoba (although they are found in parts of B.C. too, they're not in such prevalent use). They all take the same format: Section, Township, Range, Meridian. For example:

SE 28-12-17-W5

This is the South Eastern corner of Section 28, Township 12, Range 17, West of the 5th Meridian.

You could simply use a single field and parse it with regular expressions or break it out into separate fields containing the breakdown of the LSD. Running regexes in SQL Server can be a pain when it comes to performance. My take on it is the same as that of address data in general, that because each piece of data is a separate unique piece of data that they should be stored in separate fields. However, given that the large majority of this type of address data is not used by the general public in lieu of a street address, I might recommend designing something that would allow this information to be separated from (but linked to) your main address data. Given however that the land description/LSD is also part of every Canadian address, I might be tempted to store it in my main address table depending on the target audience of the database.

Here's a post about the breakdown of the Alberta Land Resource System:

http://www1.agric.gov.ab.ca/%24department/deptdocs.nsf/all/agdex10302

One thing you will often find in Oil & Gas at least (which is where the bulk of my experience comes from) is that workers will often refer to only the first two parts of the LSD - i.e. 28 of 12, or 43 of 16. The remainder of the LSD is implied by the locality of the address - i.e. Grand Prairie, Fox Creek, Wolf Lake etc.

OTHER TIPS

I used to think that was a good idea, until my applications were deployed and a constant stream of requests came in for changes. At the time, I lived in Ontario, Canada and I thought that I knew what a standard address looked like. Until some customer had an address that combined the P.O. Box and the street address into one. Then the Alberta customers started coming in with their structured codes mentioned in another answer. Then British Columbia adresses where there was no street or street number, just a Site and Compartment and Rural Route. C4,S16 RR7 Mountainville. And then with American suppliers, the postal code rules went out the window. And then the occasional British customer appeared in the database and everything that you thought you knew about addresses goes out the window. A building name with no street number, two street names, two town names all in one address!

Bright House,
Waverly Crescent off Oxford Road,
Seething-under-Norton, Banbury,
Oxfordshire
OB7 3VT
United Kingdom

That is a made up example, but they do exist. The British manage to get by because every local company has an up to date national address database and all they need is the postcode and house name or number. The rest is filled in from the database.

In the case of that address, there is probably another Waverly Crescent in Seething-under-Norton, which is why the second street name. And Seething-under-Norton was a village which long became incorporated into the town of Banbury, so both names are in the address. In British addresses you often get municipalities which do not exist. They are considered postal towns in that they exist only within the postal system. There is usually a historical basis for the name. Lots of London addresses are like that with people writing London one time, and Leyton or South Ruislip or Hillingdon another time. The letters all do get delivered promptly.

So unless a feature of your software is that it prevents foreign address entries into the system, don't do this!

By the way, you mentioned identifying all the people on the same street by the street name. Have you checked out Denver Colorado where there are street names which end and pick up again, a mile further away. I once got lost in Littleton (Denver suburb) trying to find a certain address only to be told that I needed another such-and-such street which was elsewhere. Then there is the British practice of using two or more names for every road. For instance, there will be a Homerton Road which is then named Marsh Hill and then Homerton High Street and then Urswick Road and then Lower Clapton Road all in the space of a kilometer or two. More commonly, in the village of Wick there will be a Norton Road. If you follow it, after a mile or two you will note that your are now on Wick Road, entering the village of Norton.

In my opinion, there is some benefit to doing this, but in all cases where I have seen it tried, the cost and complexity of doing it outweigh the negligible benefits.

Not the least of your problems is going to be training/forcing users to respect all the separate fields you give them to enter all the different parts that make up and address in a consistent format - most people just don't think of a street address being made up of up to 5 different parts, and will likely just enter stuff like they usually do.

So, if it wasn't for the people actually trying to use the system, its probably a good idea.

In Europe, the street address is usually a name plus a "number" (where number can be something like "3a"). I've seen databases which store them separately for a single reason: You can look up the street names in an official database to verify them (for example to protect against typos). So for this use case, it makes sense to keep the verifiable and the non-verifiable parts in different columns.

I doubt that you can find a reason to break it down further except for a fuzzy fear that you might lose information.

Is a benefit if you are following an Objected Oriented approach for modeling your whole domain. Your question reminds me this blog title March is not a number as an answer. Something analogue could be say about streets and addresses ("a street is not a string"). SnOrfus points out a valid problem on his comment.

While their may be advantages to storing each component of an address independently, you'll have to weigh the cost against your business needs and requirements. If you're not doing anything related to mailing or shipping, it may be overkill and complicate aspects of your architecture significantly. Furthermore, anyone else that works on your code may not understand what's going on and make introduce significant problems without realizing it, thus corrupting the database.

As an example, within the United States, the following is the "delivery line" of a street: PO Box 12345.

In this case, "PO Box" is actually the street name while 12345 is the primary number. Normal "formatting" and conventional wisdom suggest that an address should have the primary number listed first, as in "123 Main Street".

If you're formatting the address back together in a standard fashion, you'll have to remember how the address looked originally.

This is where address verification and standardization come in. At least within the United States and a few other nations modern nations, including Great Britain, you have the advantage of being able to submit the address to an online address verification service which can clean, standardize, and verify your address. Oftentimes, these services will give back the address as it should appear on the mail piece as well as the component parts of the address. If you have a business need for the components, then you can store them independently. Otherwise, another call to the address verification web service should yield the components again at the desired time.

In the interest of full disclosure, I'm the founder of SmartyStreets. We offer US-based address verification services which include CASS-Certified validation of your addresses. You're more than welcome to contact me personally with any questions you have.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow