Question

CREATE TEMPORARY TABLE tt;
INSERT INTO tt; many times
SELECT FROM tt ORDER BY id LIMIT ?,1000;

The temp table is created at the beginning of the script having an "id" auto_increment column (and others). Part 1 of the script fills up the table using various SELECTs. Part 2 should process all selected rows in chunks of 1000 but will never ever write the table again.

I assume that these two commands are the same

SELECT FROM tt ORDER BY id LIMIT ?,1000;
SELECT FROM tt LIMIT ?,1000;

but is this true? Is there any chance that mySQL will retrive the rows in a different order on multiple SELECTs even if the table is not written in between?

Sorting the temp table is the longest part while reading it and I'd be happy to get rid of the ORDER BY part but I don't want to miss a record because the order changed between SELECT LIMIT 0,1000 and SELECT LIMIT 1000,1000.

Was it helpful?

Solution

I guess this depends on the storage-engine. I can tell for MyISAM, a select without any order will return rows sequentially as they occur in file (even if held in memory).

As you ordered data during write, it will be ordered by id within MyISAM-temptables. So you can safely assume it will always return correct results as long as you don't delete from myISAM-temp.

But the new question is - is temptable created in myIsam-format? I assume the default storage engine will be used. This used to be MyISAM, but was changed to InnoDB after MySQL 5.1.

I don't know how innodb would handle an unsorted select.

-- edit

http://dev.mysql.com/doc/refman/5.1/en/internal-temporary-tables.html

In big (on disk) temporary tables myISAM is used. It is not intentional behavior rows are served sequentially in order as written into, but it is by design (as documented) reliable as long as myISAM table is appended only.

This is most probably not true for any "hashed table" like InnoDB, BDB, and eventually MEMORY. But when memory table is converted to "on disk table", current order is "frozen on the disk", and therefore reliable again.

But I guess we have an other kind of problem here:

  • The 'real' problem seems to be your "order by" feels too slow, and you would like to get around that. Do you by chance know your table size and sessions max_heap_table_size? This give you a hint if your table has been converted to myisam. You can also play with Index-Hash-method (BTREE, HASH) to look for optimization.

  • Are you aware that LIMIT N,M will get painfully slow because it has to select N rows, discard them, and then select M rows to return?

OTHER TIPS

This isn't directly related to ORDER BY, but be careful using the LIMIT N,M pattern. This is often used for paginating results, e.g.

... limit 0,10;
... limit 10,10;
... limit 20,10;

Each time through mysql will "select" the first N rows and just return the last M to the client. As you go deeper and deeper you're selecting out more and more of rows, only to be thrown away.

In short each "page" takes longer to load. If your datasets are small and queries are using indexes well this may seem inconsequential, but the effect will become evident as you grow. I feel it's generally better to put these constraints in the where clause. e.g.

... where id between 0 and 10;
... where id between 11 and 20;
... where id between 21 and 30;

Finally, if you're worried about the ordering to the limiting effectively creates the correct range restrictions for you then this concern is completely eliminated to begin with.

Instead of your original query

SELECT FROM tt ORDER BY id LIMIT ?,1000;

you may want to refactor the query tp apply the order by against the keys only and join the keys back to the original table:

SELECT tt.*
FROM
(
    SELECT id FROM tt
    ORDER BY id LIMIT ?,1000
) ttkeys
LEFT JOIN tt USING (id);

This can be done to force 1000 keys only to join against the original table.

I use LEFT JOIN instead of INNER JOIN because LEFT JOIN keeps the order of the keys after the join is complete, whereas INNER JOIN tends to undo the ordering in the subquery.

If doing this against a internal temp table has issues, then try it with an external temp table as follows:

CREATE TEMPORARY TABLE tt;
INSERT INTO tt; many times
CREATE TEMPORARY TABLE ttkeys
    SELECT id FROM tt
    ORDER BY id LIMIT 1000
;
ALTER TABLE ttkeys ADD PRIMARY KEY (id);
SELECT tt.* FROM ttkeys
LEFT JOIN tt USING (id);
Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top