Question

How can I make an ORDER BY clause with a small LIMIT (ie 20 rows at a time) return quickly, when I can't use an index to satisfy the ordering of rows?

Let's say I would like to retrieve a certain number of titles from a table 'node' (simplified below). I'm using MySQL by the way.

node_ID INT(11) NOT NULL auto_increment,
node_title VARCHAR(127) NOT NULL,
node_lastupdated INT(11) NOT NULL,
node_created INT(11) NOT NULL

But I need to limit the rows returned to only those a particular user has access to. Many users have access large numbers of nodes. I have this information pre-calculated in a big lookup table (an attempt to make things easier) where the primary key covers both columns and the presence of a row means that usergroup has access to that node:

viewpermission_nodeID INT(11) NOT NULL,
viewpermission_usergroupID INT(11) NOT NULL

My query therefore contains something like

FROM
  node
  INNER JOIN viewpermission ON
    viewpermission_nodeID=node_ID
    AND viewpermission_usergroupID IN (<...usergroups of current user...>)

... and I also use a GROUP BY or a DISTINCT so that a node is only returned once even if two of the user's 'usergroups' both have access to that node.

My problem is that there seems to be no way for an ORDER BY clause which sorts results by created or last updated date to use an index, because the rows being returned depend on values in the other viewpermission table.

Therefore MySQL would need to find all rows which match the criteria, then sort them all itself. If there are one million rows for a particular user, and we want to view, say, the latest 100 or rows 100-200 when ordered by last update, the DB would need to figure out which one million rows the user can see, sort this whole result set itself, before it can return those 100 rows, right?

Is there any creative way to get around this? I've been thinking along the lines of:

  • Somehow add dates into the viewpermission lookup table so that I can build an index containing the dates as well as the permissions. It's a possibility I guess.

Edit: Simplified question

Perhaps I can simplify the question by rewriting it like this:

Is there any way to rewrite this query or create an index for the following such that an index can be used to do the ordering (not just to select the rows)?

SELECT nodeid
FROM lookup
WHERE
  usergroup IN (2, 3)
GROUP BY
  nodeid

An index on (usergroup) allows the WHERE part to be satisfied by an index, but the GROUP BY forces a temporary table and filesort on those rows. An index on (nodeid) does nothing for me, because the WHERE clause needs an index with usergroup as its first column. An index on (usergroup, nodeid) forces a temporary table and filesort because the GROUP BY is not the first column of the index that can vary.

Any solutions?

Was it helpful?

Solution

Can I answer my own question?

I believe I have found that the only way to do what I describe is for my lookup table to have rows for every possible combination of usergroups a person may want to be a member of.

To pick a simplified example, instead of doing this:

SELECT id FROM ids WHERE groups IN(1,2) ORDER BY id

If you need to use the index both to select rows and to order them, you have to abstract that IN(1,2) so that it is constant rather than a range, ie:

SELECT id FROM ids WHERE grouplist='1,2' ORDER BY id

Of course instead of using the string '1,2' you could have a foreign key there, etc. The point being that you'd have to have a row not just for each group but for each combination of multiple groups.

So, there is my answer.

Anyway, for my application, I feel that maintaining a lookup for all possible combinations of usergroups for each node is not worth it. For my purposes, I predict that most nodes are visible to most users, so I feel that it is acceptable to simply to make the GROUP BY use the index, as the filtering doesn't need it so badly.

In other words, the approach I'll take for my original query may be something like:

SELECT
    <fields>
FROM
  node
  INNER JOIN viewpermission ON
    viewpermission_nodeID=node_ID
    AND viewpermission_usergroupID IN (<...usergroups of current user...>)
  FORCE INDEX(node_created_and_node_ID)
GROUP BY
  node_created, node_ID

GROUP BY can use an index if it starts at the left most column of the index and it is in the first non-const non-system table to be processed. The join then deals with the entire list (which is already ordered), and only those not visible to the current user (which will be a small proportion) are removed by the INNER JOIN.

OTHER TIPS

Copy the value you are going to order by into to viewpermission table and add it to your index.

You could use a trigger to maintain that value from the other table.

select * from
(
select *
FROM  node  
INNER JOIN viewpermission 
ON    viewpermission_nodeID=node_ID    
AND viewpermission_usergroupID IN (<...usergroups of current user...>)
) a
order by a.node_lastupdated desc

The inner query gives you the filtered subset, which I understand is substantially smaller than the whole set. Only the smaller has to be sorted.

MySQL has problems when you use GROUP BY and ORDER BY in the same query. That causes a filesort, and that's probably the biggest penalty for performance.

You can eliminate the need for a DISTINCT (or GROUP BY) by using a non-correlated subquery instead of a JOIN.

SELECT * FROM node
WHERE node_id IN (
  SELECT viewpermission_nodeID
  FROM viewpermission
  WHERE viewpermissiong_usergroupID IN ( <...usergroups...> )
)
ORDER BY node_lastupdated DESC
LIMIT 100;

There's no need to sort or do a DISTINCT on the subquery, since IN (1, 1, 2, 3) is the same as IN (1, 3, 2).

Note that MySQL can use only one index per table in a given query, so it'll try to make the best choice between an index on node_id and an index on node_lastupdated. It can't use both, and even if you made a compound index it wouldn't help in this case.

Remember to analyze different solutions with EXPLAIN.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top