Question

I am studying for exams and am failing to find a solid criteria by which I can determine if the Cartesian Product x is to be used or if Natural Join |X| is to be used.

I had come up with a rough guide that:

"If you need to project an attribute that has the same name as an attribute in the table to be joined you must use x and state the table names to be projected: tableA.colname1 = tableB.colname1"

This however doesn't follow some of the solutions in my notes and my lecturer seems to use x with the above convention or |x| interchangeably.

Does anyone have a rule that can be followed to define use of one over the other?


Take for example this schema (only schema related to the question quoted for brevity):

takes(ID, course_id, sec_id, semester, year, grade)
student(ID, name, dept_name, tot_cred)

Q) Find the name of all students who took courses that were taught in either Spring 2011 or Autumn 2011.

My answer attempt:

π name(σ semester="Spring" ^ year=2011(takes ⋈ student)) ∪ π name(σ semester="Autumn" ^ year=2011(takes ⋈ student))

Actual answer:

π name(σ semester="Spring" ^ year=2011 ^ takes.ID=student.ID(takes x student)) ∪ π name(σ semester="Autumn" ^ year=2011 ^ takes.ID=student.ID(takes x student))

Can anyone provide a reason as to why?

In my mind the Natural Join would take care of the takes.ID=student.ID?

Was it helpful?

Solution

A natural join, as I understand it, is a projected, filtered Cartesian product:

  • You take the Cartesian product, then
  • select it, so that the values in columns of the same name have the same value, and
  • project it, so that all columns have distinct names.

Under this assumption, your answer is isomorphic to the actual answer.

To see this, you might want to expand the natural join to the above sequence of operators, and float them around using the laws of relational algebra. You'll see that the projection disappears due to the projection to name, and the selection criterion is fused with the selection above. You'll end up with exactly the same tree as the actual answer, even though you never changed the meaning of your own answer!


I can think of one reason why your lecturer uses these concepts interchangeably: your lecturer wants you to understand that these concepts can be used interchangeably, because "the natural join is just a shortcut" (though that's debatable).

OTHER TIPS

Cartesian product is just a special case of natural join where the joined relations don't have any attribute names in common. In Codd's original algebra renaming was a separate operation altogether. To obtain a true cartesian product of two relations that have some attributes in common you would have to rename those attributes before doing the (natural) join.

In the interests of brevity the renamings are sometimes left out of written examples and the product symbol used instead. Unfortunately this obscures the important point, that there is only one kind of join.

I think there are two extreme situation:

  1. there are no duplicate rows in inner join: inner join is equal to intersect (i mean only the result). distinct inner join ~ intersect

  2. there are no common feature in the inner join: inner join is equal to Cartesian product.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top