Difference between performance of the two sql queries?

https://stackoverflow.com/questions/3486472

28-09-2019
|

Question

I have a field in my table having text data type.

Is there a difference in performance for the following two sql queries:

 select * from tablename where fieldname="xyz%";
 select * from tablename where fieldname="%zyx";

If we were to implement the execution of these queries, this is what I think we would need to do:

We have to match the two regexes (xyz* and *zyx).

We will have to check the string chars one by starting from the beginning.

For the first query we will have to read the first three characters to see if there is a match but for the second one we will have to read till the we get the end of the string to determine if the match has occurred. But if we have the length of the string stored somewhere we can directly read the last three characters giving similar performance as the first case.

My question is whether commercial databases like mysql and oracle show any difference in the performance in the execution of the queries.

Solution

Picking up from your comment : " I just want to know if a starts with match is diff from an ends with match".

Firstly - remember that we are not looking for the best algorithm to match a string. We are looking for the best algorithm to find all matching strings in a set of N rows. We want to do better than 'Do algorithm X, N times'.

If fieldname is NOT indexed, then there will be very little difference in performance between the two queries - the SQL engine is just going to do a match on the first 3 or last 3 bytes of the string, which is simply a matter of offsetting to the right memory location.

If the fieldname IS indexed, there will be a huge difference in performance between the two searches, because rather than examining all N rows, we can discard most of the data.

i.e. for the "xyz%" version, we can use a binary search.

We start at the middle element, which happens to be 'peter'. We can immediately discard everything before 'peter' and get the middle element on the remainder - 'samantha', and so on, until we find the entries starting 'xyz'.

With the "%xyz" version, we cannot do this, as ANY string could potentially match at the end, we need to look at every string.

As the size of our table expands, the difference between these two approaches becomes large.

The solution of creating a field/index for the reverse of fieldname allows us to use the binary search technique again. (In some databases it is actual possible to do this without creating an extra field, but through using particular index types, virtual columns, etc).

This is simplified a lot - for detail on the actual implementation of database indexes, look into B-Tree and B*Tree indexes.

OTHER TIPS

There is definitely difference between performance on all DB's. First case will be definitely faster if column is indexed.

I had similar instance in my project where user was also allowed to search "ends with" (like your second query).

As this was frequently used operation and query was slow,

We added additional column to table which stored reverse of fieldname.
indexed this column
whenever ends with was searched , we searched in this new column :) (by reversing original search string)

so your second query becomes:

 select * from tablename where fieldname_rev="xyz%";

This approach made it as fast as starts with query.

If fieldname is indexed, most of commercial databases can transform the first query into an interval search

select * from tablename where fieldname>="xyz" and fieldname<"xy{"

which is very fast.

Yes, there is a difference between the following two queries:

select * from tablename where fieldname LIKE "xyz%";
select * from tablename where fieldname LIKE "%zyx";

The equals ("=") operator doesn't allow wildcards in SQL - you need to use LIKE
The queries are entirely different
- "xyz%" will return records that start with "xyz"
- "%xyz" will return records that end with "xyz"
Assuming an index exists on the fieldname column, "%xyz" can not use the index - but"xyz%" could, which means it would be faster.

The fastest means of finding substrings within text is to use Full Text Search (FTS) - both Oracle and MySQL have their own native functionality, and there are 3rd party tools like Sphinx and Solr.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow