Question

Consider that there is a bunch of tables which link to "countries" or "currencies" tables.

For making data easier to read I'd like make CHAR field with country code (eg US, GB, AU) and currency code (USD, AUD) a primary keys in each of those 2 tables and all other tables will use this CHAR as a foregin key.

Database is mysql with innodb engine.

Is it going to cause performance issues? Is it something i should avoid?

Was it helpful?

Solution

Performance isn't really the main issue, at least not for me. The issue is more about surrogate vs natural keys.

Country codes aren't static. They can and do change. Countries change names (eg Ethiopia to Eritrea). They come into being (eg the breakup of Yugoslavia or the Soviet Union) and they cease to exist (eg West and East Germany). When this happens the ISO standard code changes.

More in Name Changes Since 1990: Countries, Cities, and More

Surrogate keys tend to be better because when these events happen the keys don't change, only columns in the reference table do.

For that reason I'd be more inclined to create country and currency tables with an int primary key instead.

That being said, varchar key fields will use more space and have certain performance disadvantages that probably won't be an issue unless you're performing a huge number of queries.

For completeness, you may want to refer to Database Development Mistakes Made by AppDevelopers.

OTHER TIPS

James Skidmore's link is important to read.

If you're limiting yourself to country and currency codes (2 and 3 characters, respectively), you may very well be able to get away with declaring the columns char(2) and char(3).

I would guess that would not be a no-no. If you're using an 8-bit character encoding, you're looking at columns the size of smallint or mediumint, respectively.

My answer is that there isn't a clear-cut answer. Just pick an approach within your project and be consistent. Both have their pluses and minuses.

@cletus makes a good point about using generated keys, but when you run into a situation where the data is relatively static, like country codes, introducing a generated key for them seems overly complex. Despite real world politics, having country codes appear and disappear isn't really going to be much of an issue for most business problems (but if your data actively concerns all 190-210 countries, follow that advice).

Using surrogate keys universally is a good and popular strategy. But remember, it comes in response to modeling databases using natural keys for everything. Ack! Open up a 15 year old database book. Using natural keys everywhere definitely gets you into difficult situations, as initial understanding of the problem domains prove wrong. You do want to have consistency in your modelling practices, but using different techniques for clearly different situations is OK.

I suspect that performance for most modern databases on var(2) foreign keys will be the same (or better) than int fields. Databases have for years supported textual foreign keys.

Given that we have no other information about the project, if you preference is to use the country codes as foreign keys, and you have the option to do so, I'd say it's OK. It'll be easier to work with the data. It is a little against current practices, but-- in this case-- it's not going to back you into some corner.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top