I had a table with 3 columns and 23 million rows. Each single row contains a primary key, int value, and a "one single" word, that is it. Each word is 3 characters long. In other words, each word's "Hash Representation" was there. The Table size was 5 GB. This table is well indexed.

Now I am going to create the same table with real words in it, no more 3 character hash. So each word will contain their normal number of letters. Now this table contains 23 million rows, 3 columns. However since the length of the words is more than the 3 character hash, the size of the table is 15 GB. This table is well indexed.

The only difference between these 2 tables is that in first table, the data type of the Hash is char(3). Now in the second table, the data type of the "non_hashed_word" is varchar(20).

Now please have a look at the below code, which we ran in our previous table I mentioned. This code runs 0.01 seconds.

    SELECT `indexVal`, COUNT(`indexVal`) AS OverlapWords, `UniqueWordCount`, 
(COUNT(`indexVal`)/`UniqueWordCount`) AS SimScore FROM `key_word`WHERE `hashed_word` IN 
('001','01v','0ji','0k9','0vc','0@v','0%d','13#' ,'148' 
,'1e1','1sx','1v$','1@c','1?b','1?k','226','2kl','2ue','2*l','2?4','36h','3au','3us','4d~')
 GROUP BY (`indexVal`) LIMIT 500

We are expecting to run the same code in our new table as well.

So my question is, even though the number of rows and the number of columns are same, can our query be sloe because the table size is much larger now? Or maybe because the datatype is varchar() now?

有帮助吗?

解决方案

Definitely yes. Use EXPLAIN to get the query plan. Another reasons:

  1. limit has to have the whole result set to get the first 500 -> more rows, more data

  2. Operations (count, / , etc..) needs to be executed for each row

  3. If index exist, this is larger when there is more rows, can be fragmented on the disk

etc....

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top