Question

I would like to write a JavaScript function that validates a zip code, by checking if the zip code actually exists. Here is a list of all zip codes:

http://www.census.gov/tiger/tms/gazetteer/zips.txt (I only care about the 2nd column)


This is really a compression problem. I would like to do this for fun. OK, now that's out of the way, here is a list of optimizations over a straight hashtable that I can think of, feel free to add anything I have not thought of:

  • Break zipcode into 2 parts, first 2 digits and last 3 digits.
  • Make a giant if-else statement first checking the first 2 digits, then checking ranges within the last 3 digits.
  • Or, covert the zips into hex, and see if I can do the same thing using smaller groups.
  • Find out if within the range of all valid zip codes there are more valid zip codes vs invalid zip codes. Write the above code targeting the smaller group.
  • Break up the hash into separate files, and load them via Ajax as user types in the zipcode. So perhaps break into 2 parts, first for first 2 digits, second for last 3.

Lastly, I plan to generate the JavaScript files using another program, not by hand.

Edit: performance matters here. I do want to use this, if it doesn't suck. Performance of the JavaScript code execution + download time.

Edit 2: JavaScript only solutions please. I don't have access to the application server, plus, that would make this into a whole other problem =)

Was it helpful?

Solution

I would like to write a JavaScript function that validates a zip code

Might be more effort than it's worth, keeping it updated so that at no point someone's real valid ZIP code is rejected. You could also try an external service, or do what everyone else does and just accept any 5-digit number!

here is a list of optimizations over a straight hashtable that I can think of

Sorry to spoil the potential Fun, but you're probably not going to manage much better actual performance than JavaScript's Object gives you when used as a hashtable. Object member access is one of the most common operations in JS and will be super-optimised; building your own data structures is unlikely to beat it even if they are potentially better structures from a computer science point of view. In particular, anything using ‘Array’ is not going to perform as well as you think because Array is actually implemented as an Object (hashtable) itself.

Having said that, a possible space compression tool if you only need to know 'valid or not' would be to use a 100000-bit bitfield, packed into a string. For example for a space of only 100 ZIP codes, where codes 032-043 are ‘valid’:

var zipfield= '\x00\x00\x00\x00\xFF\x0F\x00\x00\x00\x00\x00\x00\x00';
function isvalid(zip) {
    if (!zip.match('[0-9]{3}'))
        return false;
    var z= parseInt(zip, 10);
    return !!( zipfield.charCodeAt(Math.floor(z/8)) & (1<<(z%8)) );
}

Now we just have to work out the most efficient way to get the bitfield to the script. The naive '\x00'-filled version above is pretty inefficient. Conventional approaches to reducing that would be eg. to base64-encode it:

var zipfield= atob('AAAAAP8PAAAAAAAAAA==');

That would get the 100000 flags down to 16.6kB. Unfortunately atob is Mozilla-only, so an additional base64 decoder would be needed for other browsers. (It's not too hard, but it's a bit more startup time to decode.) It might also be possible to use an AJAX request to transfer a direct binary string (encoded in ISO-8859-1 text to responseText). That would get it down to 12.5kB.

But in reality probably anything, even the naive version, would do as long as you served the script using mod_deflate, which would compress away a lot of that redundancy, and also the repetition of '\x00' for all the long ranges of ‘invalid’ codes.

OTHER TIPS

You could do the unthinkable and treat the code as a number (remember that it's not actually a number). Convert your list into a series of ranges, for example:

zips = [10000, 10001, 10002, 10003, 23001, 23002, 23003, 36001]
// becomes
zips = [[10000,10003], [23001,23003], [36001,36001]]
// make sure to keep this sorted

then to test:

myzip = 23002;
for (i = 0, l = zips.length; i < l; ++i) {
    if (myzip >= zips[i][0] && myzip <= zips[i][1]) {
        return true;
    }
}
return false;

this is just using a very naive linear search (O(n)). If you kept the list sorted and used binary searching, you could achieve O(log n).

I use Google Maps API to check whether a zipcode exists.

It's more accurate.

Assuming you've got the zips in a sorted array (seems fair if you're controlling the generation of the datastructure), see if a simple binary search is fast enough.

So... You're doing client side validation and want to optimize for file size? you probably cannot beat general compression. Fortunately, most browsers support gzip for you, so you can use that much for free.

How about a simple json coded dict or list with the zip codes in sorted order and do a look up on the dict. it'll compress well, since its a predictable sequence, import easily since it's json, using the browsers in-built parser, and lookup will probably be very fast also, since that's a javascript primitive.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top