Question

I am trying to extract a random article who has a picture from a database.

SELECT FLOOR(MAX(id) * RAND()) FROM `table` WHERE `picture` IS NOT NULL

My table is 33 MB big and has 1,006,394 articles but just 816 with pictures. My problem is this query takes 0.4640 sek

I need this to be much much more faster. Any idea is welcome.

P.S. 1. of course I have a index on id. 2. there is no index on the picture field. should I add one? 3. the product name is unique, also the product number, but thats out of question.

RESULT OF TESTING SESSION.

@cHao's Solution is faster when I use it to select one of the random entries with a picture.(les then 0.1 sec. But its slower if I try to do the opposite, to select a random article without picture. 2..3 sec.

@Kickstart's Solution is a bit slower when trying to find a entry with picture, but is almost same speed when trying to find a entry without picture. average 0,149 sec.

@bob-kruithof's Solution don't work for me. when trying to find a entry with picture, it selects a entry without picture.

and @ganesh-bora, yes you are right, in my case the speed difference is about 5..15 times.

I want to thank you all for your help, and I decided for @Kickstart.

Was it helpful?

Solution

You need to get a range of values with matching records and then find a matching record within that range.

Something like this:-

SELECT r1.id
FROM `table` AS r1 
INNER JOIN (
    SELECT RAND( ) * ( MAX( id ) - MIN( id ) ) + MIN( id ) AS id
    FROM `table`
    WHERE `picture` IS NOT NULL
) AS r2
ON r1.id >= r2.id
WHERE `picture` IS NOT NULL
ORDER BY r1.id ASC
LIMIT 1

However for any hope of efficiency you need an index on the field it is checking (ie, picture in your example)

Just an explanation of how this works.

The sub select finds a random id from the table which is between the min and max ids for records for a picture. This random id may or may not be for a picture.

The resulting id from this sub select is joined back against the main table, but using >= and with a WHERE clause specifying that the record is a picture record. Hence it joins against all picture records where the id is greater than or equal to the random id. The highest random id will be the one for the picture record with the highest id, so it will always find a record (if there are any picture records). The ORDER BY / LIMIT is then used to bring back that single id.

Note that there is an obvious flaw to this, but most of the time it will be irrelevant. The record retrieved may not be entirely random. The picture with the lowest id is unlikely to be returned (will only be returned if the RAND() returns exactly 0), but if this is important this is easy enough to fix by rounding the resulting random id. The other flaw is that if the ids are not vaguely equally distributed in the full range of ids then some will be returned more often than others. For example, take the situation where the first 1000 ids were pictures, then no more until the last (33 millionth) record. The random id could be any of those 33 million, but unless it is less than or equal to 1000 then it will be the 33 millionth record that will be returned.

OTHER TIPS

You might try attaching a random number to each row, then sorting by that. The row with the lowest number will be at the top.

SELECT `table`.`id`, RAND() as `order`
FROM `table`
WHERE `picture` IS NOT NULL
ORDER BY `order`
LIMIT 1;

This is of course slower than just magicking up an ID with RAND(), but (1) it'll always give you a valid ID (as long as there's a record with a non-null picture field in the table, anyway), and (2) the WTF ratio is pretty low; most people can tell what's going on here. :) Its performance rivals Kickstart's solution with a decently indexed table, when the number of items to select from is relatively small (around 1%). Definitely don't try to select from a whole huge table like this; limit it first with a WHERE clause on some indexed field(s).

Performancewise, if you have a long-running app (ie: not PHP; i'm talking about Java, .net, etc where the app is alive even between requests), you might try to keep a list of all the IDs of items with pictures, select a random ID from that list, and load the article. You could do that in PHP too, if you wanted. It might not work as well when you have to query all the IDs each time, but it could be very useful if you can cache the list of IDs in APC or something.

for performance you can first add index on picture column so 814 records get sorted out at the top while executing the query and then you can fire your query.

How has someone else solved the problem?

I would suggest looking at the this article about different possible ways of selecting random rows in mysql.

Modified example from the article

SELECT name
FROM random JOIN
    ( SELECT CEIL( RAND() * (
        SELECT MAX( id ) FROM random WHERE picture IS NOT NULL
    ) ) AS id ) AS r2 USING ( id );

This might work in your case.

Efficiency

  • As user Kickstart mentioned: Do you have an index on the column picture? This might help getting you the results a bit faster.
  • Are your tables optimized?
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top