Counting distinct entries in a column using relational algebra
-
06-10-2020 - |
Вопрос
So I have a table similar to this one. Each user has posted a review about one or more hotels(A,B,C,D) but on different dates so there are no duplicate tuples even though a person might have reviewed the same hotel more than once.
I need to count the number of DISTINCT hotels every user has reviewed using RELATIONAL ALGEBRA only. How can I do that?
example to show notations I use:
R = ƔUser,COUNT(Hotel_reviewed)->Num_Reviews (InitialRelation- table 1)
would give the number of reviews by each user
The result should be the following table:
example to show notations I use:
R = ƔUser,COUNT(Hotel_reviewed)->Num_Reviews (InitialRelation- table 1)
would give the number of reviews by each user
Решение
Besides the more compact syntax (from @McNets' answer):
select User,
count(distinct Hotel_Reviewed) HotelsReviewed
from InitialRelation
group by User;
we can also do a projection first to find distinct User, Hotel_Reviewed pairs and then aggregate:
select User,
count(Hotel_Reviewed) as Hotels_Reviewed
from
( select distinct
User,
Hotel_Reviewed
from InitialRelation
) as D
group by User ;
This leads us to the relational algebra notation:
R = Ɣ User, COUNT(Hotel_Reviewed) -> Hotels_Reviewed
(π User, Hotel_Reviewed (InitialRelation)) -> D
Другие советы
You can get it by counting distinct hotels, grouped by user.
select User,
count(distinct Hotel_Reviewed) HotelsReviewed
from your_table
group by User;
create table reviews([user] varchar(20), date_review date, hotel_reviewed varchar(10) ); insert into reviews values ('Joe', '20170101', 'A'), ('Joe', '20170201', 'A'), ('Tom', '20170101', 'C'), ('Bryan', '20170101', 'B'), ('Bryan', '20170201', 'A'), ('Mike', '20170101', 'B'), ('Mike', '20170201', 'C'), ('Mike', '20170301', 'D'); GO
select [User], count(distinct Hotel_Reviewed) HotelsReviewed from reviews group by [User]; GO
User | HotelsReviewed :---- | -------------: Joe | 1 Tom | 1 Bryan | 2 Mike | 3
dbfiddle here