Pergunta

I have this table:

Nodes(IDNode, LeftPath, RightPath);


IDNode     LeftPath     RightPath
1            1             1Z
2            1.2           1.2Z
3            1.3           1.3Z
4            1.2.4         1.2.4Z
5            5             5Z
6            5.6           5.6Z

LeftPath is the left path of the node and RightPath is the leftPath ending with "Z".

This is a modification of the materialized paths in a tree structure.

If I would have only the leftPath if I want the all childrens of the Node 1, I have to use the query:

select * from Nodes where LeftPath like "1%";

But if I use the the rightPath I can use this query:

select * from Nodes where LeftPath between [1] and [1Z];

If I have a index for the LeftPath, when I use the like, the performance is worst than the use of between? Because the index order the recors and I guess that it's fast to get all the childrens. I get a better performance with between?

Foi útil?

Solução

Differences Between Observed SQL Query Performance Using Indexed Columns#

Before considering index types or other details at that level, have you looked at the execution plan for the queries you are comparing? "SQL Plans" tell you if your query approach is using the indexes you've added for optimization or if they are no better than the original, non-optimized design.

The following discussion walks through a few key concepts to interpret the observations explained in the original post:

  1. Are the queries using the LIKE and the BETWEEN clause benefiting from the column index?
  2. With all else considered equal, which really performs better? (i.e., faster)

Prediction: the record-set of the example provided looks very small. Even if there is an index and it is being used in the execution plan, there may not be a difference in speed between a heap-based table scan (i.e., stepping through all the records one by one) and a plan that uses the index that organizes the records in some way. As for question (2), an expanded talk on query plan caching suggests a reason for the observed difference between the outcomes of each SQL operator.

Comments About the Examples in the Original Post:

The second example query does not involve the column RightPath at all.

Using indexes may not always mean a faster, more efficient query. Indexed does not always = Faster Query Performance.

Initial Discussion: How to Set Up and Compare SQL Query Approaches##

I used a MySQL database to illustrate a few concepts which should still extend towards a MSSQL RDBMS environment. The key indicators that will prove to you "slow" vs. "fast" query response includes a lot of factors, which can initially be identified by a query EXECUTION PLAN. There are some cases where an index is not even being used.

Setting up the test environment I used (In MySQL):

 CREATE TABLE Nodes 
(
 IDNode int auto_increment primary key, 
 LeftPath varchar(20), 
 RightPath varchar(30)
 );

 INSERT INTO Nodes (LeftPath, RightPath)
 VALUES
 ('1', '1Z'),
 ('1.2', '1.2Z'),
 ('1.3', '1.3Z'),
 ('1.2.4', '1.2.4Z'),
 ('5', '5Z'),
 ('5.6', '5.6Z');

 COMMIT;

 CREATE TABLE NodesWIndx 
(
 IDNode int auto_increment primary key, 
 LeftPath varchar(20), 
 RightPath varchar(30)
 );

 CREATE INDEX NodesIndx_Ix1 ON NodesWIndx(LeftPath);
 CREATE INDEX NodesIndx_Ix2 ON NodesWIndx(RightPath);

 INSERT INTO NodesWIndx (LeftPath, RightPath)
 VALUES
 ('1', '1Z'),
 ('1.2', '1.2Z'),
 ('1.3', '1.3Z'),
 ('1.2.4', '1.2.4Z'),
 ('5', '5Z'),
 ('5.6', '5.6Z');

 COMMIT;

Querying a Table Using a WHERE and LIKE Restriction on an Indexed Column

Your first query is using the Index you placed on it. A non-specified index placed on a string typed column such as your example will work left to right, as in:

 -- Querying a Table WITH an Index
 SELECT * FROM NodesWIndx WHERE LeftPath LIKE '1%'

 | IDNODE | LEFTPATH | RIGHTPATH |
 |--------|----------|-----------|
 |      1 |        1 |        1Z |
 |      2 |      1.2 |      1.2Z |
 |      3 |      1.3 |      1.3Z |
 |      4 |    1.2.4 |    1.2.4Z |

Query Execution Plan and Index Utilization

WHERE and LIKE SQL Query on an Indexed Column

Note that the plan in this query shows that the index created with the table, NodesIndx_Ix1 was used to assist in finding the records with LeftPath column values that match the query criteria.

Querying a Table Using a WHERE and LIKE Restriction on a NON-Indexed Column

Here is that same query against a similar table and data with NO index on the filtered column:

 -- Querying a Table WITHOUT an Index
 SELECT * FROM Nodes WHERE LeftPath LIKE '1%'

 | IDNODE | LEFTPATH | RIGHTPATH |
 |--------|----------|-----------|
 |      1 |        1 |        1Z |
 |      2 |      1.2 |      1.2Z |
 |      3 |      1.3 |      1.3Z |
 |      4 |    1.2.4 |    1.2.4Z |

Query Execution Plan and Index Utilization

WHERE and LIKE SQL Query on a NON-Indexed Column

In this case, the plan shows that no indexes were used to assist in providing the SQL query results.

Querying a Table Using BETWEEN on an Indexed Column

Here is that same query against a similar table and data with NO index on the filtered column:

 -- Querying a Table Using BETWEEN with an Index
 SELECT * FROM Nodes WHERE LeftPath BETWEEN '1' and '1Z'


 | IDNODE | LEFTPATH | RIGHTPATH |
 |--------|----------|-----------|
 |      1 |        1 |        1Z |
 |      2 |      1.2 |      1.2Z |
 |      3 |      1.3 |      1.3Z |
 |      4 |    1.2.4 |    1.2.4Z |

Query Execution Plan and Index Utilization

BETWEEN SQL Query Operator With an Index

The query with the BETWEEN clause also appears to use the index created for the column used in the WHERE criteria.

Conclusions and Recommendations

The observed leap in performance between a query with a LIKE or a BETWEEN operator could be the result of caching of the query execution plan from the previous request.

Whenever an attempt to execute a query is made, the query pipeline looks up its query plan cache to see whether the exact query is already compiled and available. More on SQL Server Query Plan Caching

At least in the simpler execution plan information from the MySQL example, both queries in question utilized the same index optimization (the possible_keys value) as well as the other remaining profile values.

Did an Index Make a Difference?

Indexes do not always provide predictable improvement in performance. In addition, the type of index created (e.g. In MSSQL: Unique, Clustered and Non-Clustered, etc.) should be chosen appropriately to match the kind of data that is queried (and the distribution of its values) or else the RDBMS will ignore the index.

I found a useful discussion on best practices when choosing candidates for indexes. The most useful tip of this article was that:

Most index performance improvements from indexes are seen with larger quantities of data.

Exactly how large? In Microsoft SQL Server article about best practices When setting up table indexes for performance gains, the authors ran tests on DML and SELECT only operations with test record sets of one million or more in order to generate significant and measurable differences in performance.

I might be able to update some of this discussion with SQL Server examples, but for now, the concept of examining an execution plan remains the same regardless of the RDBMS you are looking at. The plans of some RDBMS platforms are more detailed than others but they lead developers in the same general direction when it comes to analyzing SQL queries for optimization.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top