Question

I am creating a database capable of performing SQL queries. I'm using Flex / Bison to create my AST (abstract syntax tree). For example: select * from table where score> 10 * (age * salary)

enter image description here

When i evaluate this AST tree, i traverse the tree, starting from the left and then the right sub-tree.

Comparing with muparser (http://muparser.beltoforion.de/), it is much faster. Is there any other way to perform this processing? If I use Postfix algorithm (http://en.wikipedia.org/wiki/Reverse_Polish_notation) Will I have better performance?

How traditional databases perform this task?

Was it helpful?

Solution

Pretty much to process a SQL query, something has to parse it, process the result of parsing to either execute the query or compile it into query-steps to be executed efficiently later.

With databases, people mostly have performance troubles with executing the query against the data, esp. if the database is large enough so it is mostly disk resident. Parsing the query is rarely the expensive part of the process.

Most queries can be executed in several different ways and achieve the same results. This is especially true if you consider algebraically identical but different query text/(trees) (for your example consider WHERE SCORE/AGE > 10*SALARY).

Making "the" query execute fast is generally a matter of code generation, enumerating or otherwise generating many equivalent queries, estimating their performance based on the set of indexes and statistics, and choosing the best of the bunch for execution.

The compiler literature is full of sophisticated schemes for doing this in non-naive ways. Most of these schemes start with an AST, convert that into some kind of graph representing the program computations, apply chosen "optimizations" [in the case of DB queries, based on estimated data access costs], produce a final high level result and then compile down to machine code using native instructions for straightforward math, and carefully code library procedures for more complicated tasks (e.g, "join").

Reverse polish, while cute, isn't generally a very good scheme for representing a parse tree or analyzing for code generation purposes. (You may note it is equivalent to the AST but in fact it is not as convenient to access). It is acknowledged but pretty much ignored in the compiler literature. The only place it is popular is in building so-called threaded interpreters. While these are "fast", they aren't anywhere near as fast as compiled, optimized code by a factor of 10x or more.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top