Question

I have rather large tables (> 1000 M rows) where I have to do quick lookups. They are typically indexed with composite indexes to enable index-only scans like this CREATE INDEX table (study, analysis, gene, sample) INCLUDE (value) where a study can contain several analyses, which contain many genes, each of which contains measurements for some samples. One type of query would be to get the value for a given gene in all samples in an analysis. This would return ~500 rows. Another type of query would return values for all genes and samples in an analysis. This would return millions of rows. I wonder if it would make sense to also include a BRIN index on study, analysis to speed up such queries. It would then be redundant with the existing b-tree index. Is this bad practice that just adds overhead and complexity, or do you think it will be used by the scheduler like I want?

EDIT: I forgot to mention that the results will typically not be cached, so it would have to read from disk.

Was it helpful?

Solution

Your index on (study, analysis, gene, sample) INCLUDE (value) would still be usable for a query like select gene, sample, value from t where study='PNAS908' and analysis = 'the correct one'. And will probably be better than BRIN, or at last not worse by enough to matter.

Note that if "analysis" is just labelled in a low-entropy way within a study (a, b, c, 1, 2, 3, log, linear, sqrt, intervention, SOC, placebo, ...), such that the same analysis identifier is reused from study to study, then the multi-column BRIN index is unlikely to do what you want. The columns in a BRIN index are not hierarchical like they are within a btree index, they are each independent.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top