Pergunta

I have a site where there is an activity feed, similar to how social sites like Facebook have one. It is a "newest first" list that describes actions taken by users. In production, there's about 200k entries in that table.

Since this is going to be asked anyway, I'll first share the full table structure:

CREATE TABLE `karmalog` (
  `id` int(11) NOT NULL auto_increment,
  `guid` char(36) default NULL,
  `user_id` int(11) default NULL,
  `user_name` varchar(45) default NULL,
  `user_avat_url` varchar(255) default NULL,
  `user_sec_id` int(11) default NULL,
  `user_sec_name` varchar(45) default NULL,
  `user_sec_avat_url` varchar(255) default NULL,
  `event` enum('EDIT_PROFILE','EDIT_AVATAR','EDIT_EMAIL','EDIT_PASSWORD','FAV_IMG_ADD','FAV_IMG_ADDED','FAV_IMG_REMOVE','FAV_IMG_REMOVED','FOLLOW','FOLLOWED','UNFOLLOW','UNFOLLOWED','COM_POSTED','COM_POST','COM_VOTE','COM_VOTED','IMG_VOTED','IMG_UPLOAD','LIST_CREATE','LIST_DELETE','LIST_ADMINDELETE','LIST_VOTE','LIST_VOTED','IMG_UPD','IMG_RESTORE','IMG_UPD_LIC','IMG_UPD_MOD','IMG_GEO','IMG_UPD_MODERATED','IMG_VOTE','IMG_VOTED','TAG_FAV_ADD','CLASS_DOWN','CLASS_UP','IMG_DELETE','IMG_ADMINDELETE','IMG_ADMINDELETEFAV','SET_PASSWORD','IMG_RESTORED','IMG_VIEW','FORUM_CREATE','FORUM_DELETE','FORUM_ADMINDELETE','FORUM_REPLY','FORUM_DELETEREPLY','FORUM_ADMINDELETEREPLY','FORUM_SUBSCRIBE','FORUM_UNSUBSCRIBE','TAG_INFO_EDITED','IMG_ADDSPECIE','IMG_REMOVESPECIE','SPECIE_ADDVIDEO','SPECIE_REMOVEVIDEO','EARN_MEDAL','JOIN') NOT NULL,
  `event_type` enum('follow','tag','image','class','list','forum','specie','medal','user') NOT NULL,
  `active` bit(1) NOT NULL,
  `delete` bit(1) NOT NULL default '\0',
  `object_id` int(11) default NULL,
  `object_cache` text,
  `object_sec_id` int(11) default NULL,
  `object_sec_cache` text,
  `karma_delta` int(11) NOT NULL,
  `gold_delta` int(11) NOT NULL,
  `newkarma` int(11) NOT NULL,
  `newgold` int(11) NOT NULL,
  `migrated` int(11) NOT NULL default '0',
  `date_created` timestamp NOT NULL default '0000-00-00 00:00:00',
  PRIMARY KEY  (`id`),
  KEY `user_id` (`user_id`),
  KEY `user_sec_id` (`user_sec_id`),
  KEY `image_id` (`object_id`),
  KEY `date_event` (`date_created`,`event`),
  KEY `event` (`event`),
  KEY `date_created` (`date_created`),
  CONSTRAINT `karmalog_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `user` (`id`) ON DELETE SET NULL,
  CONSTRAINT `karmalog_ibfk_2` FOREIGN KEY (`user_sec_id`) REFERENCES `user` (`id`) ON DELETE SET NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Before optimizing this table, my query had 5 joins and I ran into slow query times. I have denormalized all of that data, so that not a single join is there anymore. So the table and query is flat.

As you can see in the table design, there's an "event" field which is an enum, holding a few dozen possible values. Throughout the site, I show activity feeds based on specific event types. Typically that query looks like this:

SELECT * FROM karmalog as k
WHERE k.event IN ($events) AND k.delete=0 
ORDER BY k.date_created DESC, k.id DESC 
LIMIT 0,30

What this query does is to find the latest 30 entries in the total set that match any of the events passed in $events, which can be multiple.

Due to removing the joins and having indices on most fields, I was expecting this to perform very well, but it doesn't. On 200k entries, it still takes over 3 seconds and I don't understand why.

Regarding solutions, I know I could archive older entries or partition the table per event type, but that will have quite a code impact, and I first would like to understand why the above is so slow.

As a temporary work-around, I'm now doing this:

SELECT * FROM
(SELECT * FROM karmalog ORDER BY date_created DESC, id DESC LIMIT 0,1000) as karma
    WHERE karma.event IN ($events) AND karma.delete=0
LIMIT $page,$pagesize

What this does is to limit the baseset to search in to the latest 1000 entries only, hoping and guessing that there's 30 entries to be found for the filters that I pass in. It's not very robust though. It will not work for more rare events, and it brings pagination issues.

Therefore, I first like to get to the root cause of why my initial query is slow, against my expectation.

Edit: I was asked to share the execution plan. Here's the test query:

EXPLAIN SELECT * FROM karmalog 
WHERE event IN ('FAV_IMG_ADD','FOLLOW','COM_POST','IMG_VOTE','LIST_VOTE','JOIN','CLASS_UP','LIST_CREATE','FORUM_REPLY','FORUM_CREATE','FORUM_SUBSCRIBE','IMG_GEO','IMG_ADDSPECIE','SPECIE_ADDVIDEO','EARN_MEDAL') AND karmalog.delete=0
ORDER BY date_created DESC, id DESC
LIMIT 0,36  

Execution plan:

id            = 1
select_type   = SIMPLE
table         = karmalog
type          = range
possible_keys = event
key           = event
key_len       = 1
red           = NULL
rows          = 80519
Extra         = Using where; Using filesort

I'm not sure how to read into the above, but I do know that the sort clause really seems to kill this query. With this sorting, it takes 4.3 secs, without 0.03 secs.

Foi útil?

Solução

SELECT * sometimes slows down ordered queries by a huge amount, so let's start by refactoring your query as follows:

 SELECT k.* 
   FROM karmalog AS k
   JOIN (
      SELECT id 
        FROM karmalog
       WHERE event IN ($events)
         AND delete=0
       ORDER BY date_created DESC, id DESC
       LIMIT 0,30
        ) AS m ON k.id = m.id
  ORDER BY k.date_created DESC, k.id DESC

This will do your ORDER BY ... LIMIT operation without having to haul the whole table around in the sorting phase. Finally it will look up the appropriate thirty rows from the original table and sort just those again. This might save a whole lot of I/O and in-memory data shuffling.

Second, if id column values are assigned in ascending order as records are inserted, then the use of date_created in your ORDER BY operation is redundant. But MySQL doesn't know that, so leaving it out might help. This will be true if you always use the current date when inserting, and never update the dates.

Third, you might be able to use a compound covering index for the selection (inner) query. This is an index that contains all the fields you need. When you use a covering index, the whole query can be satisfied from the index, and there's no need to bounce back to the original table. This saves disk access time.

Try this compound covering index: (delete, event, id). If you decide you can't get rid of the use of date_created in your ordering, try this instead: (delete, event, date_created, id)

Outras dicas

Add a compound index over the two relevant questions. In your table, you can do that by specifying e.g.

KEY `date_created` (`date_created`, `event`)

This key can still be used to satisfy plain old date_created range searching. But in addition to that, the event data is included as well, so the DBS will be able to detect the relevant rows by only looking at the index.

If you want, you can try the other order as well: first event and then date. This might allow some optimization if there are many event types but your filter only contains few. On the other hand, I'm not sure the system will be able to make use of the LIMIT clause in this case, so I'm not certain that this other order will be any help at all.

Edit: I completely missed that your date_event index already has this info. According to your execution plan, though, that one isn't used. Looks like the optimizer is getting things wrong. You could try removing the event index, and perhaps the date index as well, and see what happens then.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top