Are individual queries faster than joins?

https://dba.stackexchange.com/questions/42998

31-10-2019
|

Question

Conceptual question: Are individual queries faster than joins, or: Should I try to squeeze every info I want on the client side into one SELECT statement or just use as many as seems convenient?

TL;DR: If my joined query takes longer than running individual queries, is this my fault or is this to be expected?

First of, I am not very database savvy, so it may be just me, but I have noticed that when I have to get information from multiple tables, it is "often" faster to get this information via multiple queries on individual tables (maybe containing a simple inner join) and patch the data together on the client side that to try to write a (complex) joined query where I can get all the data in one query.

I have tried to put one extremely simple example together:

SQL Fiddle

Schema Setup:

CREATE TABLE MASTER 
( ID INT NOT NULL
, NAME VARCHAR2(42 CHAR) NOT NULL
, CONSTRAINT PK_MASTER PRIMARY KEY (ID)
);

CREATE TABLE DATA
( ID INT NOT NULL
, MASTER_ID INT NOT NULL
, VALUE NUMBER
, CONSTRAINT PK_DATA PRIMARY KEY (ID)
, CONSTRAINT FK_DATA_MASTER FOREIGN KEY (MASTER_ID) REFERENCES MASTER (ID)
);

INSERT INTO MASTER values (1, 'One');
INSERT INTO MASTER values (2, 'Two');
INSERT INTO MASTER values (3, 'Three');

CREATE SEQUENCE SEQ_DATA_ID;

INSERT INTO DATA values (SEQ_DATA_ID.NEXTVAL, 1, 1.3);
INSERT INTO DATA values (SEQ_DATA_ID.NEXTVAL, 1, 1.5);
INSERT INTO DATA values (SEQ_DATA_ID.NEXTVAL, 1, 1.7);
INSERT INTO DATA values (SEQ_DATA_ID.NEXTVAL, 2, 2.3);
INSERT INTO DATA values (SEQ_DATA_ID.NEXTVAL, 3, 3.14);
INSERT INTO DATA values (SEQ_DATA_ID.NEXTVAL, 3, 3.7);

Query A:

select NAME from MASTER
where ID = 1

Results:

| NAME |
--------
|  One |

Query B:

select ID, VALUE from DATA
where MASTER_ID = 1

Results:

| ID | VALUE |
--------------
|  1 |   1.3 |
|  2 |   1.5 |
|  3 |   1.7 |

Query C:

select M.NAME, D.ID, D.VALUE 
from MASTER M INNER JOIN DATA D ON M.ID=D.MASTER_ID
where M.ID = 1

Results:

| NAME | ID | VALUE |
---------------------
|  One |  1 |   1.3 |
|  One |  2 |   1.5 |
|  One |  3 |   1.7 |

Of course, I didn't measure any performance with these, but one may observe:

Query A+B returns the same amount of usable information as Query C.
A+B has to return 1+2x3==7 "Data Cells" to the client
C has to return 3x3==9 "Data Cells" to the client, because with the join I naturally include some redundancy in the result set.

Generalizing from this (as far fetched as it is):

A joined query always has to return more data than the individual queries that receive the same amount of information. Since the database has to cobble together the data, for large datasets one can assume that the database has to do more work on a single joined query than on the individual ones, since (at least) it has to return more data to the client.

Would it follow from this, that when I observe that splitting a client side query into multiple queries yield better performance, this is just the way to go, or would it rather mean that I messed up the joined query?

No correct solution

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange