Domanda

One possible approach to optimize search queries is (a) to store the records that retain data that corresponds to different relations/tables in (b) the same file → on the same pages. That way, a join can be performed much faster.

I've googled "co-clustering" and surprisingly few results showed up. I found nothing on MySQL for example. There is some indication that Oracle offered it 10 years ago. Is co-clustering still a valid option for optimization?

For example, you have two relations/tables:

  • Employee (id, name, age, did)
  • Department (did, location)

A typical query for which you optimize maybe looks something like this:

SELECT E.name, 
       E.age 
  FROM Employee E, 
       Department D 
 WHERE E.age = 25 
   AND E.did = D.did;

If you had 1,000,000 employees and they are all between 25 and 27, the best join method is probably sort-merge join or hash join - both require multiple scans.

Now, if you store tuples/rows of multiple relations/tables on the same page, you can use a physical structure that stores a department with a certain did together with employees with the same did. Notice that such join requires much less IOs.

È stato utile?

Soluzione

Is co-clustering still a valid option for optimization?

Sure, it is a valid option for optimization, if your DBMS offers it. As David Browne has mentioned in a comment, only Oracle does (which, in a way, tells you how practical this feature is).

As you have noted, it is useful in a very limited set of scenarios, while being detrimental to a wider range of queries. In the cases that might benefit from table co-clustering you can employ alternative optimization techniques, such as materialized (indexed) views or column-organized tables, which offer similar performance benefits while being more widely available.

Consider also that today common use of SSD storage, abundance of cheap RAM on database servers, in combination with better query optimizers, decrease the value of marginal reduction in physical I/O at the cost of possible negative side effects and additional database maintenance overhead.

TLDR: don't bother.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a dba.stackexchange
scroll top