Question

Is it preferable to store redundant information, (which can be otherwise generated from existing data,) or to instead convert the existing data each time you need access?

I've simplified my specific problem as best as I can below, hoping that the provided answers are useful as future-reference material.

Example:

Let's say we've developed a program that places data into Squares on a grid (like a super-descriptive game of Tic-Tac-Toe or something) and assigns various details, and a unique identification number to each:

Also known as the world

Throughout our program, we often perform logic based on a square's X and/or Y coordinates (checking for 3 in a row) and other times we only need the ID (perhaps to access a string at "SquareName[ID]") - We aren't exactly certain which of these two is accessed more often, but it's a rather close competition.

Up until now we've simply stored the ID inside the square class, and converted it with some simple formulas whenever just the X or Y are needed. Say we want to get coordinates for one square in particular:

int CurrentX = (this.Square.ID - 1) % 3) + 1;  // X coordinate, 1 through 3
int CurrentY = (this.Square.ID + 1) / 3;       // Y, 1 through 3

Since the squares don't move around or change ID after setup, part of me believes it would be simpler just to store all 3 values inside the Square class, but my other part cringes at the redundancy since access to X and Y is already easy enough to calculate from the existing ID.

(Note, This program itself is not very memory or resource intensive, nor does the size of the grid get much larger, so it mostly comes down to which option is a better practice or rule of thumb.)

What would you do?

Was it helpful?

Solution

As a rule of thumb, for a system where the data is read/write, store your basic data without redundancy.

When performance or other considerations become a practical issue, then you should denormalize as necessary. (i.e. wait for it to be a problem, don't pre-optimize overly much).

Your goal should be the most maintainable code possible. That usually means writing the least code possible. Having extra code to maintain redundant copies of data points will make your code more brittle.

OTHER TIPS

If those are values which can be determined at the moment of creation and then do not change anymore, I would go for variables populated in the constructor. It's not redundant info in so far as that it isn't stored anywhere else, but that's not my main point. When reading my code, I'd usually expect that whenever something is computed at the time of request, it might change per request. It is easy to find the point in the source where the field is populated and where it is changed, especially if it does never change, but you might end up slightly confused when looking at some calculation which will return always the same result, as it's variables can't change, and wonder whether you're just missing a case or this is really static. Also, using a descriptive variable name, you can get rid of the comments. Not that I generally aim at not commenting, but source code which doesn't even need comments is a pretty save signal for easy to understand code, which might (/should) be your aim.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top