Delete duplicate keys with relation algebra

https://dba.stackexchange.com/questions/258921

23-02-2021
|

Question

Hi I am new to databases and relational algebra. I was wondering if there is a way to remove the tuples from a table using relational algebra that have the same keys but different value.

e.g. I want to keep only [1, 5] and [4, 9] but remove everything else.

Key    | Value
-------|-------
 1     | 5
 2     | 6
 2     | 7
 2     | 8
 4     | 9

Thanks.

La solution

There are many variants and extensions of relational algebra described in books. I assume that you have a group by operator, which is an extension of the classical relational algebra, and it is written as:

a₁... a_n γ f₁...f_m

where each a_i is a grouping attribute, and f_i is an aggregation function.

Using this operator, your query could be answered by the following expression (assuming that the name of your relation is R):

R ⨝ π_Key(σ_COUNT(*)=1(_Key γ _COUNT(*) (R)))

First we group by Key and keep only the groups with a unique value for it, and then we perform a natural join on R itself to maintain only the tuples with just one value of Key.

Autres conseils

I would do something like the following (see the fiddle here):

Create and populate a table:

CREATE TABLE test
(
  key INTEGER NOT NULL,
  value INTEGER NOT NULL
);

INSERT INTO test VALUES
(1, 5), (2, 6), (2, 7), (2, 8), (4, 9);

and then:

SELECT * FROM test;

Result:

key     value
  1         5
  2         6
  2         7
  2         8
  4         9

and then run the following SQL:

SELECT key, value
FROM test WHERE key IN
(
  SELECT key
  FROM test
  GROUP BY key
  HAVING COUNT(key) = 1
  ORDER BY key
);

Result:

key     value
  1         5
  4         9

Which is the desired result. I have used PostgreSQL - the Open Source database which I would recommend that you use to learn relational database principles - it has a very high level of standards compliance and is a great learning tool.

The inner loop of SQL

SELECT key
FROM test
GROUP BY key
HAVING COUNT(key) = 1
ORDER BY key

produces those keys which have a COUNT of only 1 (in this case, the integers 1 and 4) and then the outer loop selects the values corresponding to those two integers (see the fiddle).

Licencié sous: CC-BY-SA avec attribution

Non affilié à dba.stackexchange