query very slow longtext field innodb table

https://stackoverflow.com/questions/12863549

07-07-2021
|

سؤال

Well, firts of all, sorry by my english. I try to do a query in a table that the users can include some text, like a blog page. The users can design the content in a html format. In my table it is stored like this:

Estad&amp;iacute;sticas&lt;br /&gt;
&lt;table border=&quot;0&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Columna 1&lt;/td&gt;
&lt;td&gt;Columna 2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Columna 3&lt;br /&gt;&lt;/td&gt;
&lt;td&gt;Columna 4&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

I must serch in that content all that user's want. The field 'texto' (that I'm using for it) is a longtext field and the table is innodb. I can't use full text search, 'cause it is only for myisam tables. I made the query as:

"SELECT * FROM texto WHERE texto like '%$variable%'"

but the query is very, very slow, an it take an eternity. The table has a 849 records, that's isn't big. If I write the same query in a phpmyadmin also take a very, very long time. But there are big records in this field, some records have the video html, tables, images, but it's just that, text like the above.

What I can do??? How can improve the performance of the query??? I appreciate all your help. Thanks a lot. And again, sorry for my english.

المحلول

Unfortunately you can't get more from the structure you have - any clustered or non-clustered index won't be able to handle like '%...' query. The best solution would be probably to export your data to some full-text search engine (eg. SOLR) and use this engine to fulfill users queries. If it's not possible than another solution would be to create a tokens table that will play a role of a text index:

create table tokens(
  token varchar(100) not null,
  docid int not null references testdo(id),
  constraint PK_tokens primary key (token, docid)
);

where docid references your data table (I named it testdo).

Then you need to fill the tokens table by splitting users blog posts by some common html expressions, eg.:

insert ignore into tokens values
('Estad', 1),
('Columna 1', 1),
('Columna 2', 1),
('Estad', 1);

Notice ignore keyword which will silently ignore any duplicates that may come. With tokens table filled with data you may modify your query to something like:

select * from testdo d 
  inner join tokens t on t.docid = d.id where t.token like 'Col%'

which should execute much faster as it's using indexes and key-lookups.

PS. You may improve the tokens table by adding a count column which will keep a number of occurrences of a given word in a document. You may then order the results by this column and make the them even more relevant to the search term.

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow