Question

I just finished reading about operators in relational algebra and tried to solve this problem. But I dont even understand what the first statement is doing. From what I know, $π$ (project) is a unary opeartor that takes a single relation and selects some attributes from all of the attributes (columns) of that relation. Then what does $\pi_{R-S,S}(r)$ mean? Also, why have they used $r$ as the argument to project operator? Shouldnt it be $R$ as that is the name of the relation given in the problem? Let R and S be relational schemes such that $R={a,b,c}$ and $S={c}$. Now consider the following queries on the database:

  1. $\pi_{R-S}(r) - \pi_{R-S} \left (\pi_{R-S} (r) \times s - \pi_{R-S,S}(r)\right )$
  2. $\left\{t \mid t \in \pi_{R-S} (r) \wedge \forall u \in s \left(\exists v \in r \left(u = v[S] \wedge t = v\left[R-S\right]\right )\right )\right\}$
  3. $\left\{t \mid t \in \pi_{R-S} (r) \wedge \forall v \in r \left(\exists u \in s \left(u = v[S] \wedge t = v\left[R-S\right]\right )\right ) \right\}$
  4. Select R.a,R.b From R,S Where R.c = S.c

Which of the above queries are equivalent?

1 and 2
1 and 3
2 and 4
3 and 4

Was it helpful?

Solution

Recall the definition of schema: The name of a relation and the set of attributes for a relation is called schema. An example of schema of a Movies relation is:

Movies(title, year, length, genre)

In the question, $r$, $s$ represent relation names; $R$, $S$ represent schema. Thus, we have $R = r(a,b,c)$ and $S = s(c)$. A schema consists of a set of attributes. Notation $R - S$ means the set difference between $R$'s set of attributes and $S$'s set of attributes. Thus, we have $R - S = \{a,b\}$. Now we have $\Pi_{R-S,S}(r) = \Pi_{a,b,c}(r)$.

Back to the question we have, let's work with the following example: suppose $r$ has two tuples: $(1,3,5)$ and $(2,4,8)$ with first value corresponds to $a$, second value corresponds to $b$, and the third value corresponds to $c$. $s$ has two tuples: $(5)$, $(5)$. Note we use bag semantics, which aligns with SQL semantics.

Let's consider option 1 first.

$$ \begin{align*} \Pi_{R-S}(r) - \Pi_{R-S}(\Pi_{R-S}(r)\times s - \Pi_{R-S,S}(r)) &= \Pi_{a,b}(r) - \Pi_{a,b}(\Pi_{a,b}(r)\times s - \Pi_{a,b,c}(r)) \\ &= \Pi_{a,b}(r) - \Pi_{a,b}(\Pi_{a,b}(r)\times s - r) \end{align*} $$

Evaluate $\Pi_{a,b}(r)\times s$ will lead to tuples: $(1,3,5)$, $(1,3,5)$, $(2,4,5)$, $(2,4,5)$. Then, $\Pi_{a,b}(r)\times s - r$ means take out all the tuples that show up in $r$, which leads to $\{(2,4,5),(2,4,5)\}$. Then $\Pi_{a,b}(\Pi_{a,b}(r)\times s - r)$ leads to $\{(2,4),(2,4)\}$. $\Pi_{a,b}(r)$ is $\{(1,3),(2,4)\}$. The set difference between $\{(1,3),(2,4)\}$ and $\{(2,4),(2,4)\}$ is $\{(1,3)\}$, which is the result after evaluating option 1.

Now, let's consider option 2.

$$ \{t | t \in \Pi_{R-S}(r) \land \forall u \in s (\exists v \in r (u = v[S] \land t = v[R-S]))\} $$

we have the following notation

  • $v$ is a tuple in $r$
  • $t \in \Pi_{a,b}(r)$ means that $t$ can be $(1,3)$ or $(2,4)$
  • $u$ is a tuple in $s$

Let's focus on $\forall u \in s (\exists v \in r (u = v[S] \land t = v[R-S]))$ part. Because it is for every $u$, we have

  • when $u = (5)$, there is a $v$: $(1,3,5)$ that satisfies the requirement: $v[S] = v[c] = (5) = u$ and $v[R-S] = v[\{a,b\}] = (1,3) = t$ where $t = (1,3) \in \Pi_{R-S}(r)$. Thus $(1,3)$ belongs to the result set of option 2.
  • now we look at the second $u = (5)$ in $u$ and by the same analysis above, we see $(1,3)$ belongs to the result set of option 2.

Thus, the result set of option 2 is $\{(1,3),(1,3)\}$.

Let's consider option 3.

$$ \{t | t \in \Pi_{R-S}(r) \land \forall v \in s (\exists u \in r (u = v[S] \land t = v[R-S]))\} $$

Option 3 has the same set of notation as option 2. Let's focus on $\forall v \in s (\exists u \in r (u = v[S] \land t = v[R-S]))$:

  • for $v = (1,3,5)$, there is a $u = (5)$ such that the requirement is satisfied: $v[S] = v[c] = (5) = u$ and $v[R-S] = (1,3) = t$. Thus, $(1,3)$ belongs to the final result set of option 3
  • for $v = (2,4,8)$, there is no such $u$ satisfies the constraint.

Thus, the final result set of option 3 is $\{(1,3)\}$.

Let's consider option 4. The SQL is evaluated to $\{(1,3),(1,3)\}$.

From the above example, we can see that

  • option 1 and option 2 are not equivalent
  • option 2 and option 3 are not equivalent

Now, let's consider another example where $r$ has tuples $\{(1,3,5),(2,4,6)\}$ with the first value corresponds to $a$, the second value corresponds to $b$, and the third value corresponds to $c$. $s$ has tuples $\{(5),(6),(7)$}.

By repeating the same evaluation like we did with the first example, we can see option 1 is evaluated to $\emptyset$ whereas option 3 is evaluated to $\{(1,3),(2,4)\}$. Thus, option 1 and option 3 are not equivalent.

Thus, we show that option 2 and option 4 are equivalent.

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top