Question

One of the things I run into often is problems caused by programs which don't conform to ISO standardss.

One example would be not using the ISO country tables but making up their own shorthands, which goes okay for the United States (US), or the Netherlands (NL) but goes spectacularly wrong for the United Kingdom (GB, not UK) or Spain (ES, not SP) and a lot of other countries.

As another example, internal date notations. Why would anyone store a date as 01/02/2014 ever? It is completely unclear whether that is 1st February or January 2nd, whereas if you use the ISO standard you just store 2014-02-01* and it's unambiguously February 1st.

My question: When and why should a programmer make up their own constructs when there is an ISO standard available?

* Store 2014-02-01, and format the date accordingly when showing it to an end user.

Was it helpful?

Solution

Never attribute to malice that which is adequately explained by stupidity. -- Robert J Hanlon.

That, and a lack of communication.

So, it's not a conspiracy of anti-ISO sentiment making people think "I know, I'll use UK instead of GB", nor is it an inclination that "they know better", or even a sense that the standard is no good. It'll be entirely because they just don't know it is there, and they should use it.

I mean, for some people, if it's not bundled into Visual Studio, it might as well not exist. For some others, maybe they just don't want the full set or it's too difficult to fetch the definitive list, so they just make up their own sub-set to solve their immediate situation. For others, the default is what gets used - so date formatting isn't "formatted in ISO, or even country locale", it's "formatted in whatever comes out" and if that suits them, then it's job done (this is usually a criticism of American programmers).

OTHER TIPS

When I program in Ruby, I generally always ignore the ISO Ruby standard. Why? Because it's incredibly restrictive! ISO Ruby is a minimal subset of the intersection of Ruby 1.8 and Ruby 1.9. The current version of Ruby, which is supported by all Ruby implementations (or at least will be very soon) is Ruby 2.1, and it has many features that make programming easier. Programming in ISO Ruby is a PITA.

When I program in C#, I also ignore ISO C#, which is a subset of C# 2.0 (and more importantly, the ISO Class Library is an extremely small subset of the .NET BCL), and instead I program in C# 5.0 and I don't restrict myself to use only the libraries which are specified in the ISO CLI, instead I use the common subset of libraries available in .NET 4.5.2 and Mono 3.4.0.

And when doing web design, I very much prefer to use HTML5 over ISO HTML (which is a small subset of HTML 4.01 Strict), again, because HTML5 is much more feature-rich than a restricted subset of an ancient version of HTML.

So, there are good reasons for ignoring ISO standards.

Per your example, "GB" is the country code for the United Kingdom. However, "UK" was the at one time the MARC (US Library of Congress) standard code, although I believe that's deprecated. And the IANA uses .uk for the top-level domain for the United Kingdom.

So, if something doesn't conform to an ISO standard, it doesn't mean that no standard is being used; it may simply mean that a different standard is being used. (As @Jörg noted in a comment, the nice thing about standards is that there are so many to choose from.) In which case the question really becomes which standard would be most appropriate for the given problem domain, environment, etc.?

Responses to that question would probably be largely opinion-based and quickly degenerate into a "religious" debate. But ISO standard conformance isn't necessarily always the best answer. For example, if a piece of software needs to interface with library databases, MARC standards might be a more appropriate choice than ISO. If most of your organization's software does things a certain way, you might want to stick with that approach, at least in the short-term -- it's your organization's "standard", after all.


Also, standards do evolve/change. What was conformant yesterday may not be today.


And, while I wouldn't want to rule out ignorance and/or sloth as the cause of the issues you point out ... the developer might simply not have had enough time to address them.

Compliance with an ISO standard is not always a cost-free activity. If a particular standard isn't already implemented in the toolkit she's using, a programmer is faced with a necessary choice: Is it cheaper to properly implement this now, or not implement the standard and deal with conversions later?

It's easy to say "hey, you should always implement the standard", but everything has a cost. And there are some good reasons why a programmer may not want to implement an ISO standard.

  • The customer may be following a proprietary or non-ISO standard. Better to hew to the standard a customer is expecting than leave unintended headaches for your successor by hiding an additional implementation besides what the customer wants and your language requires.
  • There may be a great deal of existing data, and a conversion or format-break may not yet be feasible. If you have twenty years of customer contacts and contracts keyed with local date-times, you don't necessarily want to change all those hundreds of millions of fields to ISO standard dates until you can do it right.
  • Adherence to the standard may impose a greater cost than the benefit provides. If you're dealing with entries entirely within the United States, for example, the five-character ISO-3166-2 code (US-NY) is three unneeded characters over the standard US postal code (NY).

In the case of inexperienced programmers/database designers, it's because of not knowing. They tend to reinvent the wheel because they don't know a group of people spanning industries, already discussed the issue and came up with an standard approved by all who participated, often after very long discussions, revisions, etc. Recently a co-worker of mine showed incredulity when I told him there was an ISO standard regarding whether a given week is considered the last week of a year or the first one of the next year (ISO 8601). He didn't believe a standard existed regarding something so specific. I told him that the correctness of many applications depended on that standard.

In the case of experienced programmers/database designers, it's disregard caused by "knowing better", not-invented-here syndrome, and/or grandiosity. They don't trust ISO or any other standard bodies because they consider the ISO code is "not stable enough", meaning it will change someday. So they create their own, invented-here or auto incremented codes/identifiers hindering interoperability, which they also disregard. See this similar question, albeight database-design inclined. They give reasons like:

I may not necessarily want my database design to depend on a bunch of third parties (IATA, ISO), regardless of how stable their standards are. Or, I may not want to depend on a particular standard at all.

enter image description here

Oddly enough those who disregard standards use standard USB ports, buy standard-sized DVDs and BluRays and drive cars with tires that conform to standards.

Well, people tend to ignore ISO standards: for example, you wrote

if you use the ISO standard you just store 20140201* and it's unambiguously February 1st.

but the fully ISO8601-compliant rendition is in fact 2014-02-01. (see also xkcd 1179)

One of the reasons is that the application domain and the users might not use these standards themselves. Even when some domains use some standards, some of them might have made different choices than the ISO standards, often for historical reasons.

If your users already use "UK" in their existing procedures(1) to refer to "United Kingdom of Great Britain and Northern Ireland", it doesn't necessarily make sense to use "GB" in their data structures (especially if what you mean by country isn't quite an "ISO" country, e.g. separating UK nations or having subtle differences with the Channel Islands and so on). Of course, you could have a mapping between internal storage a presentation, but sometimes, it's a bit over the top. You're rarely programming for the sake of programming, you often have to adapt to your environment.(2)

You also have to remember that these standards have evolved in parallel with software. You often have to develop within the context of other pieces of software, some of which may be imperfectly designed, some of which may still be affected by legacy decisions.

Even if you look at internal data storage formats, some ambiguities are hard to resolve. For example, as far as I know, Excel uses a decimal number to represent timestamps: it uses an integer as the number of days since a reference date, then what's after the decimal represents the fraction of the 24 hours to give you the hour... The problem is that this prevents you from taking into account time zones or daylight saving time (23h or 25h in a day), and Excel will convert any date/time to that internal format by default. Whether you want to use the ISO format or not becomes irrelevant if another piece of software you have to work with doesn't leave you a choice.

(1) I don't mean "programming procedures" here.

(2) Don't ask me why people don't use those standards in their daily lives either. I mean YYYYmmdd is clear, dd/mm/YYYY is clear, but ordering a date with medium, small, big order of granularity like mm/dd/YYYY, that just doesn't make sense :-) .

Why would I not use ISO 3166-1 alpha-2 country codes?

Because I use STANAG 1059 country codes... and in that UK is the code for the United Kingdom (instead of GB as per ISO 3166-1).

Alternatively I could use the FIPS country codes - again UK is the country code for the United Kingdom.

There are many standards (ISO and non-ISO) and sometimes a particular domain uses/demands a standard which is incompatible with the ISO standard.

Storing 20140201 is not unambiguous at all. Only when you include the knowledge that is following the ISO standard does it become unambiguous. The same goes for 01/02/2014: when you include the knowledge that the format is mm/dd/yyyy it is also perfectly unambiguous.

As long as the application does not have to interface with other application any well documented standard can work just as fine.

There is a tradeoff between what is easy for humans (I tend to use 1-2-2014) and computers (who would even be better of with a binary representation instead of ISO). Novice programmers tend to stick with what they can easily understand, more experiences programmers start to see the advantages of computer-oriented storage.

A point not raised so far is the cultural appropriateness of international standards.

Consider the international standard for measurements. Let's present those to users in the United States. I'm not sure all your US users will be happy about kilometres, kilograms, and litres.

Consider that international standards are written by governments. If the government of Spain chooses not to recognise the Basque language then how does it get a ISO specification? This is particularly an issue with the dialects and creoles of marginalised groups.

Even country codes can be problematic: does the Crimea now get its own country code? Formulas are eventually found (eg, "Former Yugoslav Republic of Macedonia") but your application may need some stand-in until the diplomacy or war comes to an end.

Consider that international standards are written with particular applications in mind. These may not entirely fit your application. For example, if you are storing language with a view to sending letters then you may wish to code the blind distinctly, even if they are proficient in speaking, say, US English. Statistics organisations are well aware of the need for specifying the exact meaning of a variable (aka the variable's "meta data") as they encounter every possible edge case during a population census. Some of that rigour is well worthwhile for database fields.

The final point is that in making these sort of choices your program may be making a political statement. This reality can mess with the nicest of code (eg, you may need multiple language names for the same language).

In my experience programmers fail to use ISO standards for a variety of stated reasons, e.g.:

  • "I didn't know that there was an ISO standard" (ceases to be a valid reason once you're told!)
  • "The standard is inaccessible (can't find /afford a copy)" (really??)
  • "The standard is too restrictive" (usually if the standard says "don't" then there's a good reason. Ignore it at your, and your customer's, peril!)
  • "The standard doesn't include the latest functionality / library" (no standard includes EVERY library that you'll ever want to use so adhere to the standard for the things that it DOES include/cover and be consistent with the standard for the things that it doesn't)
  • "The standard is too cumbersome to implement" (greatly over-used excuse but see below)

The only reason that I accept from my staff, as not being a lame excuse, is "the standard really isn't a good 'fit'" - backed up by evidence. Sometimes the complexity of the the applicable ISO standard is out of proportion to the problem / solution. Sometimes the context in which you will implement your solution is significantly different from that assumed by the standard. And sometimes the standard can be improved upon - that is how progress happens.

More often than not though, the failure to use the ISO standard can be attributed to inexperience, laziness or arrogance. I regret to say that English-speaking programmers are particularly guilty of laziness as regards internationalisation and that our US colleagues tend to perceive ISO as "an irrelevant European thing" (apologies to the minority to whom this does not apply).

ISO has lots of standards. As did CCITT/ITU. Some of these standards are "aspirational" standards, while other are minimum required functionality. It's not often clear which is which.

I remember in the 1980s asking why some equipment vendors implement one subset of the standard while other vendors implement a different subset. That's when it came out that the standards are often set before something works. And vendors often deliberately choose not to implement standards so that they can hamper interoperability, which then grants them an advantage.

That's why I like IETF RFCs. They don't even become RFCs until there are 3 independent implementations of the RFC.

If I'm building an Oracle database, and I want to store dates, I'm going to use the Oracle DATE datatype. I won't know, or care, whether or not Oracle conforms to the ISO standard. This is really a case of my conformance to a different standard (the Oracle one) and not so much departure from the ISO standard. See @David's response.

In some cases, by the time I realized there was an ISO standard for something I had designed, the cost of going back and redesigning would have been prohibitive, or at least was seen that way.

In the short term, more working code is produced by using available standards or by inventing new ones than by careful research into extant standards. The downside occurs when large scale integration requires interoperability. This almost always occurs in the context of a later project.

Licensed under: CC-BY-SA with attribution
scroll top