Question

I am creating a database with two main tables: items, locations.

The items table contains approx 3mill records and is rising at a rate of 1mill records a month.

The locations table contains 50,000 locations (name, latitude, longitude) and will not change in size.

Every read of the items table will require a JOIN to the locations table to find out where the item is located unless i duplicate the location content for every item record. I anticipate around 5mill queries to the items table every month.

Searching of the database will be performed by Sphinx, so I do not need to worry about complicates mysql geodistance queries.

My question is, would I be better off duplicating the locations data for every item, or perform JOIN statements?

Thanks in advance

Was it helpful?

Solution

I think it would be better if you got a JOIN between ITEM and LOCATIONS with a foreign key in ITEM's table.

There will be too many redundancies of data if you duplicate data for every tiem

OTHER TIPS

We can discuss denormalisation from the academic point of view, but the practice always differs from theory. How you design your structure should also depend on the use - for you, I guess it's the responce time.

Joining to a 50k table is not very costly and will not take much time as long, as location size is not rising

If you have a plenty of free space, denormalisation will always improve your queries but needlessly duplicate 50.000 of records, on the other hand you will loose speed you are looking for.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top