Pregunta

I'm testing out an Ingres Vectorwise database to try to find a faster way to run queries as we've been experiencing some slowness with relation to text lookup.

Implementation #1 below is very fast if I'm looking up about 5000 items based on a char() column. Unfortunately, it takes about the same length of time to look up 50 items as it does to look up 5000.

Implementation #2 on the other hand is very fast for 50 items, but does not scale at all for 5000 items.

Implementation #3 is obviously going to be worse than #1, but i was just providing examples of what I've tried.

My question is: Assuming my table only has two columns, q = bigint, r = char() field, can you think of any other way to write this query using basic SQL so that I can have more alternatives to choose from? I'm hoping there's an option that performs reasonably for 50 as well as 5000 (one that scales like you would expect in a database).

Note that I'll accept any answer with alternative queries that perform the same function; the more the better. I don't expect any will scale in the way I'm hoping, but I won't know till I try more.


Implementation #1:

select q
from test_table
where r in ('a', 'b', 'c', 'd', 'e')

Implementation #2:

select q
from test_table
where r = 'a' or r = 'b' or r = 'c' or r ='d' or r = 'e'

Implementation #3:

select q
from test_table a
where exists (
    select r
    from testtable
    where r in ('a', 'b', 'c', 'd', 'e')
    and a.r = r)
¿Fue útil?

Solución 2

I can only suggest a union/union all in Implementation #2 as union may be faster then OR. Implementation #1 is what I like more. It should use indexes and should be fast enough. Starting with Oracle 10g, for example, it will automatically rewrite IN subquery to use the EXISTS.

select q
from test_table
where r = 'a' 
UNION ALL
select q
from test_table
where r = 'b' 
....

UNION operator selects distinct rows. UNION ALL selects all rows including duplicates. UNION ALL is usually faster then UNION.

Otros consejos

Vectorwise does not "scale like you would expect" because it has no B-tree indexes. It uses per block meta data of high and low values to choose which blocks to read instead, as do most columnar analytic DBMSs. This gives excellent performance when pulling out many rows from a very large table but poor performance pulling out a few rows as it will plough through many unneeded rows.

You can optimise by sorting your table on 'r', either manually or by using the "create index" command (this will prevent bulk loading after the first load). This will make the meta data more precise so fewer blocks will be read.

I note that there are only two columns, no joins and no aggregate functions in the example. Is a columnar database really what you need or would Lucene be more appropriate?

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top