Question

I have a MySql table that will hold a quite large amount of data (>1 million rows), with maybe 100 inserts per minute, filled and read out by two java applications. I wonder about the best way to structure the table. There are about 10 columns representing ten measurements. If one of the measurements is not changed, I do not want to insert it again, so I don't fill my table unnecessarily. To clarify:

col1  clo2  col3  col4
10    34    78    235
-1    5456  345   234
23    347   -1    -1

In this example, I just inserted -1 for "unchanged". Problem with this structure: if I want to get the last dataset, I have to do a lot of merging, and I'm worried about performance. So question is: is this the right approach?

I could of course just select the last value of each column on a insert and insert it again, but then each insert would entail many selects-> again, not very performant.

I also thought about a table only storing key-value pairs like this:

col1  col2
val1  10
val2  34
val3  78
val4 235
val2  5456
....

Not sure if that's a suitable approach, as I do not have the one dataset<->one complete measurement link anymore, plus the table looks kind of "messy".

Which approach would you prefer, if any, or how would you do it differently?

Was it helpful?

Solution

Inserting a -1 will not save you any space or memroy. If the column is an INT whether its null, -1, or the actual value, you're using the same ammount of space on disk and memory.

I think you need to know how sparesely popoulated the change matrix will be with unchanged values. If there are lots of unchanged values, the key value pair is the way to go. IF not, you'll could eat up more space as the char(4) needed for the key column uses the equivilant space an integer, so you'll be using 2x the space for each one you change.

In your example unchanged fields are relatively rare, so the added cost of 'double space/memory' to track these as key/value would be a sum loss.

By the way, I work with tables consisting more than integers, with tens of millions of rows with overall table size in the 3-4 Gb range all the time. While making table alterations is costly, if you stick an autoincrementing key on the table or a timestamp, it should be blazingly fast to find the current data set with sort desc, limit 1 clause.

On novel approach would be to keep seprate table per column,then you wouldn't need a key for lookup. That would result in the least memory/disk footprint.

In any of these for fast lookup of the 'last' value, you'll need to put some type of timestamp or autoincrementing index field.

If you never use these columns in joins or calucalation in SQL queries directly, and your #1 concern is memory / disk footpring, one final option would be to store it all values in a single varchar column as a delimited string like '123,22,333,1'. For large numbers this uses a lot of space and wouldn't be worth it, but if you're numbers are less than 5 digits or it could out, as it's 1 byte per character (the number digits and included commas) plus one for the varchar overhead.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top