You need to select the maximum value within the data that is less than the actual maximum value. We can predict that there'll be several sub-queries involving MAX, therefore.
Also, since this is modestly complicated query, we can apply TDQD — Test-Driven Query Design — to solve the problem in stages.
TDQD — Test-Driven Query Design
Step 1: General look at the joined tables
A first step is to join all the data in the five tables, just to get a feel for the data and how the joins will work:
SELECT S.Set_ID,
S.SetName,
S.Node_ID,
N.Node_Name,
D.Dset_ID,
A.Data_ID,
B.Inst_ID,
B.Price
FROM Sets AS S
JOIN Node AS N ON S.Node_ID = N.Node_ID
JOIN Dset AS D ON S.Set_ID = D.Set_ID
JOIN Dset_Data_Asgn AS A ON D.Dset_ID = A.Dset_ID
JOIN Blz_Tbl AS B ON A.Inst_ID = B.Inst_ID
ORDER BY S.Set_ID, B.Price;
Output:
set_id setname node_id node_name dset_id data_id inst_id price
1 set1 1 US 101 m1 301 2.00
1 set1 1 US 101 m2 302 2.15
1 set1 1 US 102 m1 304 2.25
1 set1 1 US 103 m1 305 2.50
1 set1 1 US 104 m1 306 2.85
1 set1 1 US 104 m2 307 2.98 *
1 set1 1 US 101 m3 303 3.00
2 set2 2 Chicago 105 m1 308 1.00
2 set2 2 Chicago 105 m1 309 2.00 *
2 set2 2 Chicago 106 m2 310 3.00
We can see that we will want to select the data from the two rows marked with a *
at the end.
Step 2: Find the maximum price for each Set ID
SELECT D.Set_ID, MAX(B.Price) AS Price
FROM Dset AS D
JOIN Dset_Data_Asgn AS A ON D.Dset_ID = A.Dset_ID
JOIN Blz_Tbl AS B ON A.Inst_ID = B.Inst_ID
GROUP BY D.Set_ID
ORDER BY D.Set_ID;
Output:
set_id price
1 3.00
2 3.00
Step 3: Find the second maximum prices for each Set_ID
We need to use the previous query as a sub-query and join that result with a very similar query, leading to:
SELECT D.Set_ID, MAX(B.Price) AS Price
FROM Dset AS D
JOIN Dset_Data_Asgn AS A ON D.Dset_ID = A.Dset_ID
JOIN Blz_Tbl AS B ON A.Inst_ID = B.Inst_ID
JOIN (SELECT D.Set_ID, MAX(B.Price) AS Price
FROM Dset AS D
JOIN Dset_Data_Asgn AS A ON D.Dset_ID = A.Dset_ID
JOIN Blz_Tbl AS B ON A.Inst_ID = B.Inst_ID
GROUP BY D.Set_ID
) AS M ON D.Set_ID = M.Set_ID AND B.Price < M.Price
GROUP BY D.Set_ID
ORDER BY D.Set_ID;
Output:
set_id price
1 2.98
2 2.00
We've now got the Set_ID and the second maximum price; we just have to collect the other information. In fact, we're going to need to treat the previous query as (another) sub-query.
Step 4: Collect the ID values and other data for the result
SELECT S.Set_ID,
S.SetName,
S.Node_ID,
N.Node_Name,
D.Dset_ID,
A.Data_ID,
B.Inst_ID,
B.Price
FROM Sets AS S
JOIN Node AS N ON S.Node_ID = N.Node_ID
JOIN Dset AS D ON S.Set_ID = D.Set_ID
JOIN Dset_Data_Asgn AS A ON D.Dset_ID = A.Dset_ID
JOIN Blz_Tbl AS B ON A.Inst_ID = B.Inst_ID
JOIN (SELECT D.Set_ID, MAX(B.Price) AS Price
FROM Dset AS D
JOIN Dset_Data_Asgn AS A ON D.Dset_ID = A.Dset_ID
JOIN Blz_Tbl AS B ON A.Inst_ID = B.Inst_ID
JOIN (SELECT D.Set_ID, MAX(B.Price) AS Price
FROM Dset AS D
JOIN Dset_Data_Asgn AS A ON D.Dset_ID = A.Dset_ID
JOIN Blz_Tbl AS B ON A.Inst_ID = B.Inst_ID
GROUP BY D.Set_ID
) AS M ON D.Set_ID = M.Set_ID AND B.Price < M.Price
GROUP BY D.Set_ID
) AS X ON X.Set_ID = S.Set_ID AND X.Price = B.Price
ORDER BY S.Set_ID;
Output:
set_id setname node_id node_name dset_id data_id inst_id price
1 set1 1 US 104 m2 307 2.98
2 set2 2 Chicago 105 m1 309 2.00
This data matches the desired output, but includes the various ID columns which aren't actually wanted. So the final step is to drop those columns from the SELECT statement, leading to
The Final Query
SELECT S.SetName,
N.Node_Name,
A.Data_ID AS MenuItem,
B.Price
FROM Sets AS S
JOIN Node AS N ON S.Node_ID = N.Node_ID
JOIN Dset AS D ON S.Set_ID = D.Set_ID
JOIN Dset_Data_Asgn AS A ON D.Dset_ID = A.Dset_ID
JOIN Blz_Tbl AS B ON A.Inst_ID = B.Inst_ID
JOIN (SELECT D.Set_ID, MAX(B.Price) AS Price
FROM Dset AS D
JOIN Dset_Data_Asgn AS A ON D.Dset_ID = A.Dset_ID
JOIN Blz_Tbl AS B ON A.Inst_ID = B.Inst_ID
JOIN (SELECT D.Set_ID, MAX(B.Price) AS Price
FROM Dset AS D
JOIN Dset_Data_Asgn AS A ON D.Dset_ID = A.Dset_ID
JOIN Blz_Tbl AS B ON A.Inst_ID = B.Inst_ID
GROUP BY D.Set_ID
) AS M ON D.Set_ID = M.Set_ID AND B.Price < M.Price
GROUP BY D.Set_ID
) AS X ON X.Set_ID = S.Set_ID AND X.Price = B.Price
ORDER BY S.SetName;
Final Output
setname node_name menuitem price
set1 US m2 2.98
set2 Chicago m1 2.00
Summary
The key point in the development was to build and test the queries step-by-step. The hardest step was step 3. It takes some practice to be able to devise a query like that, but after a few years (maybe twenty or so) it becomes second nature. But the step-wise refinement — or test-driven development — of complex queries is a necessity. You test each step as shown, to make sure the answer is what you expect. If it isn't correct, you modify the current query, or you go back and modify prior queries if you realize that they were not precisely what you needed after all.
I really did create the final query in those separate stages. I wouldn't consider doing it otherwise. You can search for TDQD (optionally in tag [sql]
) and you'll see other examples of developing complex queries step-by-step.
SQL to create and load the schema
CREATE TABLE Node
(
Node_ID INTEGER NOT NULL PRIMARY KEY,
Node_Name CHAR(7) NOT NULL
);
CREATE TABLE Sets
(
Set_ID INTEGER NOT NULL PRIMARY KEY,
SetName CHAR(4) NOT NULL UNIQUE,
Mkt_ID INTEGER NOT NULL,
Node_ID INTEGER NOT NULL REFERENCES Node(Node_ID)
);
CREATE TABLE Dset
(
Dset_ID INTEGER NOT NULL PRIMARY KEY,
Set_ID INTEGER NOT NULL REFERENCES Sets(Set_ID),
Dltd_Fl INTEGER NOT NULL
);
CREATE TABLE Blz_Tbl
(
Inst_ID INTEGER NOT NULL PRIMARY KEY,
Price DECIMAL(5,2) NOT NULL
);
CREATE TABLE Dset_Data_Asgn
(
Dset_ID INTEGER NOT NULL REFERENCES Dset(Dset_ID),
Inst_ID INTEGER NOT NULL REFERENCES Blz_Tbl(Inst_ID),
PRIMARY KEY(Dset_ID, Inst_ID),
Data_ID CHAR(2) NOT NULL
);
INSERT INTO Node VALUES(1, 'US');
INSERT INTO Node VALUES(2, 'Chicago');
INSERT INTO Node VALUES(3, 'Florida');
INSERT INTO Sets VALUES(1, 'set1', 1, 1);
INSERT INTO Sets VALUES(2, 'set2', 1, 2);
INSERT INTO Dset VALUES(101, 1, 0);
INSERT INTO Dset VALUES(102, 1, 0);
INSERT INTO Dset VALUES(103, 1, 0);
INSERT INTO Dset VALUES(104, 1, 0);
INSERT INTO Dset VALUES(105, 2, 0);
INSERT INTO Dset VALUES(106, 2, 0);
INSERT INTO Blz_Tbl VALUES(301, 2.00);
INSERT INTO Blz_Tbl VALUES(302, 2.15);
INSERT INTO Blz_Tbl VALUES(303, 3.00);
INSERT INTO Blz_Tbl VALUES(304, 2.25);
INSERT INTO Blz_Tbl VALUES(305, 2.50);
INSERT INTO Blz_Tbl VALUES(306, 2.85);
INSERT INTO Blz_Tbl VALUES(307, 2.98);
INSERT INTO Blz_Tbl VALUES(308, 1.00);
INSERT INTO Blz_Tbl VALUES(309, 2.00);
INSERT INTO Blz_Tbl VALUES(310, 3.00);
INSERT INTO Dset_Data_Asgn VALUES(101, 301, 'm1');
INSERT INTO Dset_Data_Asgn VALUES(101, 302, 'm2');
INSERT INTO Dset_Data_Asgn VALUES(101, 303, 'm3');
INSERT INTO Dset_Data_Asgn VALUES(102, 304, 'm1');
INSERT INTO Dset_Data_Asgn VALUES(103, 305, 'm1');
INSERT INTO Dset_Data_Asgn VALUES(104, 306, 'm1');
INSERT INTO Dset_Data_Asgn VALUES(104, 307, 'm2');
INSERT INTO Dset_Data_Asgn VALUES(105, 308, 'm1');
INSERT INTO Dset_Data_Asgn VALUES(105, 309, 'm1');
INSERT INTO Dset_Data_Asgn VALUES(106, 310, 'm2');
It would have been nice if the schema and data had been provided by the person asking the question instead of having to be written by the person answering it!