Delete duplicate keys with relation algebra
-
23-02-2021 - |
Question
Hi I am new to databases and relational algebra. I was wondering if there is a way to remove the tuples from a table using relational algebra that have the same keys but different value.
e.g. I want to keep only [1, 5] and [4, 9] but remove everything else.
Key | Value
-------|-------
1 | 5
2 | 6
2 | 7
2 | 8
4 | 9
Thanks.
La solution
There are many variants and extensions of relational algebra described in books. I assume that you have a group by operator, which is an extension of the classical relational algebra, and it is written as:
a1... an γ f1...fm
where each ai is a grouping attribute, and fi is an aggregation function.
Using this operator, your query could be answered by the following expression (assuming that the name of your relation is R):
R ⨝ πKey(σCOUNT(*)=1(Key γ COUNT(*) (R)))
First we group by Key and keep only the groups with a unique value for it, and then we perform a natural join on R itself to maintain only the tuples with just one value of Key.
Autres conseils
I would do something like the following (see the fiddle here):
Create and populate a table:
CREATE TABLE test
(
key INTEGER NOT NULL,
value INTEGER NOT NULL
);
INSERT INTO test VALUES
(1, 5), (2, 6), (2, 7), (2, 8), (4, 9);
and then:
SELECT * FROM test;
Result:
key value
1 5
2 6
2 7
2 8
4 9
and then run the following SQL:
SELECT key, value
FROM test WHERE key IN
(
SELECT key
FROM test
GROUP BY key
HAVING COUNT(key) = 1
ORDER BY key
);
Result:
key value
1 5
4 9
Which is the desired result. I have used PostgreSQL - the Open Source database which I would recommend that you use to learn relational database principles - it has a very high level of standards compliance and is a great learning tool.
The inner loop of SQL
SELECT key
FROM test
GROUP BY key
HAVING COUNT(key) = 1
ORDER BY key
produces those key
s which have a COUNT
of only 1 (in this case, the integers 1 and 4) and then the outer loop selects the value
s corresponding to those two integers (see the fiddle).