Question

I'm trying to get 4 random results from a table that holds approx 7 million records. Additionally, I also want to get 4 random records from the same table that are filtered by category.

Now, as you would imagine doing random sorting on a table this large causes the queries to take a few seconds, which is not ideal.

One other method I thought of for the non-filtered result set would be to just get PHP to select some random numbers between 1 - 7,000,000 or so and then do an IN(...) with the query to only grab those rows - and yes, I know that this method has a caveat in that you may get less than 4 if a record with that id no longer exists.

However, the above method obviously will not work with the category filtering as PHP doesn't know which record numbers belong to which category and hence cannot select the record numbers to select from.

Are there any better ways I can do this? Only way I can think of would be to store the record id's for each category in another table and then select random results from that and then select only those record ID's from the main table in a secondary query; but I'm sure there is a better way!?

Was it helpful?

Solution

You could of course use the RAND() function on a query using a LIMIT and WHERE (for the category). That however as you pointed out, entails a scan of the database which takes time, especially in your case due to the volume of data.

Your other alternative, again as you pointed out, to store id/category_id in another table might prove a bit faster but again there has to be a LIMIT and WHERE on that table which will also contain the same amount of records as the master table.

A different approach (if applicable) would be to have a table per category and store in that the IDs. If your categories are fixed or do not change that often, then you should be able to use that approach. In that case you will effectively remove the WHERE from the clause and getting a RAND() with a LIMIT on each category table would be faster since each category table will contain a subset of records from your main table.

Some other alternatives would be to use a key/value pair database just for that operation. MongoDb or Google AppEngine can help with that and are really fast.

You could also go towards the approach of a Master/Slave in your MySQL. The slave replicates content in real time but when you need to perform the expensive query you query the slave instead of the master, thus passing the load to a different machine.

Finally you could go with Sphinx which is a lot easier to install and maintain. You can then treat each of those category queries as a document search and let Sphinx randomize the results. This way you offset this expensive operation to a different layer and let MySQL continue with other operations.

Just some issues to consider.

OTHER TIPS

Working off your random number approach

  • Get the max id in the database.
  • Create a temp table to store your matches.
  • Loop n times doing the following
    • Generate a random number between 1 and maxId
    • Get the first record with a record Id greater than the random number and insert it into your temp table
  • Your temp table now contains your random results.

Or you could dynamically generate sql with a union to do the query in one step.

   SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
   UNION
   SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
   UNION
   SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
   UNION
   SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1

Note: my sql may not be valid, as I'm not a mySql guy, but the theory should be sound

First you need to get number of rows ... something like this

select count(1) from tbl where category = ? then select a random number

$offset = rand(1,$rowsNum); and select a row with offset

select * FROM tbl LIMIT $offset, 1

in this way you avoid missing ids. The only problem is you need to run second query several times. Union may help in this case.

For MySQl you can use

RAND()

SELECT column FROM table
ORDER BY RAND()
LIMIT 4
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top