Question

Should mailing addresses with city, state, and zip code be normalized? I am currently concerned with US addresses only.I have shown a normalized tables along with an ERD, and a non-normalized table at the bottom of this post. Please provide rational for your answer.

Note that To Normalize or Not To Normalize is related to this topic, but is different.

Thank you

enter image description here

CREATE  TABLE IF NOT EXISTS states (
  id CHAR(2) NOT NULL ,
  name VARCHAR(45) NULL DEFAULT NULL ,
  PRIMARY KEY (id) ,
  INDEX states_name (name ASC) )
ENGINE = InnoDB;

CREATE  TABLE IF NOT EXISTS cities (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT ,
  name VARCHAR(45) NOT NULL ,
  states_id CHAR(2) NOT NULL ,
  PRIMARY KEY (id) ,
  INDEX fk_zipcodes_states1_idx (states_id ASC) ,
  UNIQUE INDEX makeUnique (states_id ASC, name ASC) ,
  INDEX cities_name (name ASC) ,
  CONSTRAINT fk_zipcodes_states1
    FOREIGN KEY (states_id )
    REFERENCES states (id )
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB
PACK_KEYS = 0
ROW_FORMAT = DEFAULT;

CREATE  TABLE IF NOT EXISTS zipcode_types (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT ,
  name VARCHAR(45) NULL DEFAULT NULL ,
  PRIMARY KEY (id) )
ENGINE = InnoDB
PACK_KEYS = 0
ROW_FORMAT = DEFAULT;

CREATE  TABLE IF NOT EXISTS counties (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT ,
  name VARCHAR(45) NOT NULL ,
  PRIMARY KEY (id) ,
  INDEX counties_name (name ASC) )
ENGINE = InnoDB;

CREATE  TABLE IF NOT EXISTS timezones (
  id CHAR(4) NOT NULL ,
  name VARCHAR(45) NOT NULL ,
  PRIMARY KEY (id) )
ENGINE = InnoDB
PACK_KEYS = 0
ROW_FORMAT = DEFAULT;

CREATE  TABLE IF NOT EXISTS zipcodes (
  id CHAR(5) NOT NULL ,
  longitude DECIMAL(9,6) NOT NULL ,
  latitude DECIMAL(9,6) NOT NULL ,
  zipcode_types_id INT UNSIGNED NOT NULL ,
  counties_id INT UNSIGNED NOT NULL ,
  timezones_id CHAR(4) NOT NULL ,
  PRIMARY KEY (id) ,
  INDEX fk_zipcodes_zipcode_types1_idx (zipcode_types_id ASC) ,
  INDEX fk_zipcodes_counties1_idx (counties_id ASC) ,
  INDEX fk_zipcodes_timezones1_idx (timezones_id ASC) ,
  CONSTRAINT fk_zipcodes_zipcode_types1
    FOREIGN KEY (zipcode_types_id )
    REFERENCES zipcode_types (id )
    ON DELETE NO ACTION
    ON UPDATE NO ACTION,
  CONSTRAINT fk_zipcodes_counties1
    FOREIGN KEY (counties_id )
    REFERENCES counties (id )
    ON DELETE NO ACTION
    ON UPDATE NO ACTION,
  CONSTRAINT fk_zipcodes_timezones1
    FOREIGN KEY (timezones_id )
    REFERENCES timezones (id )
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB;

CREATE  TABLE IF NOT EXISTS cities_has_zipcodes (
  cities_id INT UNSIGNED NOT NULL ,
  zipcodes_id CHAR(5) NOT NULL ,
  PRIMARY KEY (cities_id, zipcodes_id) ,
  INDEX fk_cities_has_zipcodes_zipcodes1_idx (zipcodes_id ASC) ,
  INDEX fk_cities_has_zipcodes_cities1_idx (cities_id ASC) ,
  CONSTRAINT fk_cities_has_zipcodes_cities1
    FOREIGN KEY (cities_id )
    REFERENCES cities (id )
    ON DELETE NO ACTION
    ON UPDATE NO ACTION,
  CONSTRAINT fk_cities_has_zipcodes_zipcodes1
    FOREIGN KEY (zipcodes_id )
    REFERENCES zipcodes (id )
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB;

CREATE  TABLE IF NOT EXISTS someRecord (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT ,
  data VARCHAR(45) NULL ,
  address VARCHAR(45) NULL ,
  cities_id INT UNSIGNED NOT NULL ,
  zipcodes_id CHAR(5) NOT NULL ,
  PRIMARY KEY (id) ,
  INDEX fk_someRecord_cities1_idx (cities_id ASC) ,
  INDEX fk_someRecord_zipcodes1_idx (zipcodes_id ASC) ,
  CONSTRAINT fk_someRecord_cities1
    FOREIGN KEY (cities_id )
    REFERENCES cities (id )
    ON DELETE NO ACTION
    ON UPDATE NO ACTION,
  CONSTRAINT fk_someRecord_zipcodes1
    FOREIGN KEY (zipcodes_id )
    REFERENCES zipcodes (id )
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB;

Example of data in a single table

CREATE  TABLE IF NOT EXISTS otherRecord (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT ,
  data VARCHAR(45) NULL ,
  address VARCHAR(45) NULL ,
  city VARCHAR(45) NULL ,
  state VARCHAR(45) NULL ,
  zipcode VARCHAR(45) NULL ,
  county VARCHAR(45) NULL ,
  longitude DECIMAL(9,6) NULL ,
  latitude DECIMAL(9,6) NULL ,
  timezone VARCHAR(45) NULL ,
  PRIMARY KEY (id) )
ENGINE = InnoDB;
Was it helpful?

Solution 2

Addresses are not a cleanly relational entity. You should not normalize them in the traditional sense. What you may want to do is additionally store a normalized version of parts of the address (e.g. country, state, city) for your own analysis purposes, which is derived from the address provided by the user.

There are a tremendous number of exceptions in just US addresses, which are pretty well normalized compared to the rest of the world. Zip codes, by the way, correspond primarily to delivery routes by the USPS, not to specific physical locations.

As a personal example, I live in an unincorporated area which is served by a post office in a different (nearby) city, which is in a different county. My official address should, according to USPS, be written as "VC Highlands, NV 89521" and is in Storey County, NV. However the Zip code 89521 is primarily in "Reno, NV 89521" and is in Washoe County, NV. You can imagine that this causes is much trouble with just about everyone. Even the Nevada DMV refuses to accept "VC Highlands" because their database thinks 89521 is "Reno".

So even just with something "simple" in your above schema, you've got it wrong. A zip code can not only span multiple cities, but multiple counties. There are thousands more exceptions which will certainly frustrate some percentage of your users.

OTHER TIPS

Yes, if:

  1. You will be analyzing your data with respect to their addresses, and by that I mean sorting, filtering, grouping, counting based on the various fields of an address.

    If you allow free text, then you might have country names like US, USA, U.S.A., United States. This will be a pain if you want to view/count/group all American customers. Your internal users might want to drill down from continent to country to state to county to city, in which case your data needs to be normalized.

  2. You will be doing matching against external sources. For example, you have data from a 3rd party vendor, and you need to match their Company A and your Company A. Oftentimes companies have similar names, and you need to match by (parts of) the address. For example, you need to match "Acme, Inc | California" with "Acme Incorporated | CA".

  3. You want to truly avoid duplication. If you allow free text, then you would have a duplicate with "123-456 Main Street, Vancouver" and "Apt 123, 456 Main Street, Vancouver"

  4. You want truly valid data. If you allow free text, then anyone can type in anything. This one is tough, as you'll need lots of reference data with available country names, state names, county names, even street names. You could start with getting some data from geonames.org .

Please note, Ireland does not use Postal Codes, so your schema needs to account for that if going global. Read Hay's Enterprise Model Patterns for some good Address models.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top