Randomizing table contents and storing them back in the table

https://dba.stackexchange.com/questions/612

16-10-2019
|

Question

I have a table with at least a million records in it. These rows were created by a custom app that reads several SharePoint site collections and stores the item urls in the table. Now, since we read the site collections in a serial manner, first few thousands of rows belong to first site collection, next few thousands belong to second site collection, and so on.

I have another app that reads this table in a sequential manner. However, this way I end up sending HTTP requests to the same site collection for a longer time.

I know I could get random results from the table in my second app. But, that is not an option. I cannot change the way the second app works.

Now, the question is: How can I take all rows in the table, shuffule them and store it back in the table?

Update: SQL Server 2008 R2 is my database Server

Solution

If the calling app is explicitly setting a particular order in its query (if you are running MSSQL you can check this by having a profiler session running while the app does its thing, other DMBSs will have similar logging options) then there is nothing you can do and if it isn't you can not completely guarantee any particular order.

If no explicit ORDER BY clause is given then the data will come out in an order that is officially "undefined" - it will be what-ever order the server finds most convineint. For a single table query this will most likely be the order of the primary key. In MSSQL if you have a clustered index the results will most likely come out in that order for a single table query. For multi-table queries it is even less clear cut as it depends which way around the query planner choses to go to get your results (which without explicit index hints could vary over time as the balance of data in the tables, as estimated by the index stats the server keeps, changes).

If the table has no clustered index or primary key then the data is likely to come out in an arbitrary order similiar to the order the data was inserted. In this case you could try:

SELECT * INTO temp_table FROM table_to_be_reordered
DELETE table_to_be_reordered
INSERT table_to_be_reordered SELECT * FROM temp_table ORDER BY NEWID()

or this may be faster

INSERT table_to_be_reordered SELECT * FROM temp_table ORDER BY NEWID()
DROP TABLE table_to_be_reordered
EXEC sp_rename 'temp_table', 'table_to_be_reordered'

In the above NEWID() is MSSQL's function to return a UUID and it uses random rather than sequential IDs by default - in other DMBSs you should find a similar function that you can use. Be careful with your choice of function: for instance under MSSQL the RAND() function is evaulated once per query, not once per row, so SELECT * FROM somewhere ORDER BY RAND() would not have the desited effect (you can see why by running something like SELECT RAND(), * FROM some_table).

If you are using MSSQL (your question didn't state which DBMS you are targeting) and do not already have a clustered index on the table, and either have a sufficiently random column (a UUID column for instance) or could add one without upsetting the calling app, you could create a clustered index on that which would be quicker than the SELECT INTO / DELETE / SELECT INTO above. But again: this will have no effect at all if the app is explicitly asking for the results in a particular order and may not have any effect anyway otherwise.

OTHER TIPS

You don't specify which database but in Oracle you could do this by:

CREATE TABLE RAND_TABLE AS (SELECT * FROM ORIG_TABLE ORDER BY DBMS_RANDOM.RANDOM());

You will need enough space in your TEMP tablespace to cope with the sorting. Then if you wish you can rename the tables ORIG_TABLE and RAND_TABLE to swap them over. I don't think it is possible to shuffle a table "in-place".

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange