Theoretical - Given 2 Tables, is it possible to calculate the size of the resulting dataset after a join?

dba.stackexchange https://dba.stackexchange.com/questions/255916

  •  20-02-2021
  •  | 
  •  

문제

This may not be the most appropriate exchange for this, so my apologies in advance.

I was posed this question during an interview prep session for an SQL developer role and was not given the solution to it. The question is as follows:

Given 2 tables (Table A and Table B) with the same primary key and sizes (rows) of N and M respectively, is it possible to calculate the size of the resulting table/dataset from using:

  1. Inner Join
  2. Left Join
  3. Full Join

I'm really stumped on this question and as stated I was not given a solution. Part of me believes that there might not even be a set solution. If anyone has any pointers or solutions, I'd love to hear them.

도움이 되었습니까?

해결책

If "with the same primary key" means "autoincremented started with the same value (1) without gaps", and if join expression is these PKs equiality, then the answers are:

  1. LEAST(m,n)

Inner join will filter the record for the value which is present in both tables.

  1. n

Left join will return all records from tableA independently of tableB content. So tableB content can not influence on the result.

  1. GREATEST(m,n)

Full outer join will return the record for the value which is present in at least one table.

fiddle

다른 팁

In general, the cardinality of a join between A and B:

| A |X| B | <= |A| X |B|

Assume for example:

 A = {(k,a): (1,1),(2,1)}, B = {(k,b): (10,1),(11,1),(12,1)}

and the JOIN is done between A.a and B.b, the cardinality for the JOIN is 2x3=6.

If we assume that we are using the key to JOIN the tables, we know that each row in A will match at most one row in B, so the cardinality is:

| A |x| B | <= LEAST(|A|, |B|)

If we are using keys as JOIN predicate and use a LEFT JOIN, the cardinality is:

| A >|x| B | = |A|

Every row of A will be in the result set, and there is at most 1 row in B that match.

A FULL JOIN is a UNION between a LEFT and a RIGHT JOIN. If no rows match and we are using keys to JOIN:

| A >|x|< B | = |A >|x| B| + |B >|x| A| = |A| + |B|

If on the other hand there is a maximum match:

| A >|x|< B | = GREATEST(|A|,|B|)
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 dba.stackexchange
scroll top