Pregunta

I've attached the relationship of the tables for this question. I need to find for every Set ID the second highest priced item. It seems to be difficult; can someone help me?

  1. set_id is the primary key of Sets table.
  2. dset_id is the primary key of Dset.
  3. eff_dt_id is the primary key of Eff_dt table.
  4. set_id is a foreign key in the Dset table, referencing set_id of Sets table.
  5. dset_id is a foreign key in the Eff_dt and Dset_data_asgn tables, referencing the Dset table.
  6. inst_id is the primary key of the Biz_tbl.
  7. inst_id is the foreign key in Dset_data_asgn table, referencing Biz_tbl.
  8. (dset_id, inst_id) is the composite primary key for Dset_data_asgn.

 

Table structure

¿Fue útil?

Solución 2

You need to select the maximum value within the data that is less than the actual maximum value. We can predict that there'll be several sub-queries involving MAX, therefore.

Also, since this is modestly complicated query, we can apply TDQD — Test-Driven Query Design — to solve the problem in stages.

TDQD — Test-Driven Query Design

Step 1: General look at the joined tables

A first step is to join all the data in the five tables, just to get a feel for the data and how the joins will work:

SELECT S.Set_ID,
       S.SetName,
       S.Node_ID,
       N.Node_Name,
       D.Dset_ID,
       A.Data_ID,
       B.Inst_ID,
       B.Price
  FROM Sets           AS S
  JOIN Node           AS N ON S.Node_ID = N.Node_ID
  JOIN Dset           AS D ON S.Set_ID  = D.Set_ID
  JOIN Dset_Data_Asgn AS A ON D.Dset_ID = A.Dset_ID
  JOIN Blz_Tbl        AS B ON A.Inst_ID = B.Inst_ID
 ORDER BY S.Set_ID, B.Price;

Output:

 set_id setname  node_id node_name   dset_id data_id inst_id   price
      1 set1           1 US              101 m1          301    2.00
      1 set1           1 US              101 m2          302    2.15
      1 set1           1 US              102 m1          304    2.25
      1 set1           1 US              103 m1          305    2.50
      1 set1           1 US              104 m1          306    2.85
      1 set1           1 US              104 m2          307    2.98    *
      1 set1           1 US              101 m3          303    3.00
      2 set2           2 Chicago         105 m1          308    1.00
      2 set2           2 Chicago         105 m1          309    2.00    *
      2 set2           2 Chicago         106 m2          310    3.00

We can see that we will want to select the data from the two rows marked with a * at the end.

Step 2: Find the maximum price for each Set ID

SELECT D.Set_ID, MAX(B.Price) AS Price
  FROM Dset           AS D
  JOIN Dset_Data_Asgn AS A ON D.Dset_ID = A.Dset_ID
  JOIN Blz_Tbl        AS B ON A.Inst_ID = B.Inst_ID
 GROUP BY D.Set_ID
 ORDER BY D.Set_ID;

Output:

     set_id   price
          1    3.00
          2    3.00

Step 3: Find the second maximum prices for each Set_ID

We need to use the previous query as a sub-query and join that result with a very similar query, leading to:

SELECT D.Set_ID, MAX(B.Price) AS Price
  FROM Dset           AS D
  JOIN Dset_Data_Asgn AS A ON D.Dset_ID = A.Dset_ID
  JOIN Blz_Tbl        AS B ON A.Inst_ID = B.Inst_ID
  JOIN (SELECT D.Set_ID, MAX(B.Price) AS Price
          FROM Dset           AS D
          JOIN Dset_Data_Asgn AS A ON D.Dset_ID = A.Dset_ID
          JOIN Blz_Tbl        AS B ON A.Inst_ID = B.Inst_ID
         GROUP BY D.Set_ID
       ) AS M ON D.Set_ID = M.Set_ID AND B.Price < M.Price
 GROUP BY D.Set_ID
 ORDER BY D.Set_ID;

Output:

     set_id   price
          1    2.98
          2    2.00

We've now got the Set_ID and the second maximum price; we just have to collect the other information. In fact, we're going to need to treat the previous query as (another) sub-query.

Step 4: Collect the ID values and other data for the result

SELECT S.Set_ID,
       S.SetName,
       S.Node_ID,
       N.Node_Name,
       D.Dset_ID,
       A.Data_ID,
       B.Inst_ID,
       B.Price
  FROM Sets           AS S
  JOIN Node           AS N ON S.Node_ID = N.Node_ID
  JOIN Dset           AS D ON S.Set_ID  = D.Set_ID
  JOIN Dset_Data_Asgn AS A ON D.Dset_ID = A.Dset_ID
  JOIN Blz_Tbl        AS B ON A.Inst_ID = B.Inst_ID
  JOIN (SELECT D.Set_ID, MAX(B.Price) AS Price
          FROM Dset           AS D
          JOIN Dset_Data_Asgn AS A ON D.Dset_ID = A.Dset_ID
          JOIN Blz_Tbl        AS B ON A.Inst_ID = B.Inst_ID
          JOIN (SELECT D.Set_ID, MAX(B.Price) AS Price
                  FROM Dset           AS D
                  JOIN Dset_Data_Asgn AS A ON D.Dset_ID = A.Dset_ID
                  JOIN Blz_Tbl        AS B ON A.Inst_ID = B.Inst_ID
                 GROUP BY D.Set_ID
               ) AS M ON D.Set_ID = M.Set_ID AND B.Price < M.Price
         GROUP BY D.Set_ID
        ) AS X ON X.Set_ID = S.Set_ID AND X.Price = B.Price
 ORDER BY S.Set_ID;

Output:

     set_id setname  node_id node_name   dset_id data_id inst_id   price
          1 set1           1 US              104 m2          307    2.98
          2 set2           2 Chicago         105 m1          309    2.00

This data matches the desired output, but includes the various ID columns which aren't actually wanted. So the final step is to drop those columns from the SELECT statement, leading to

The Final Query

SELECT S.SetName,
       N.Node_Name,
       A.Data_ID AS MenuItem,
       B.Price
  FROM Sets           AS S
  JOIN Node           AS N ON S.Node_ID = N.Node_ID
  JOIN Dset           AS D ON S.Set_ID  = D.Set_ID
  JOIN Dset_Data_Asgn AS A ON D.Dset_ID = A.Dset_ID
  JOIN Blz_Tbl        AS B ON A.Inst_ID = B.Inst_ID
  JOIN (SELECT D.Set_ID, MAX(B.Price) AS Price
          FROM Dset           AS D
          JOIN Dset_Data_Asgn AS A ON D.Dset_ID = A.Dset_ID
          JOIN Blz_Tbl        AS B ON A.Inst_ID = B.Inst_ID
          JOIN (SELECT D.Set_ID, MAX(B.Price) AS Price
                  FROM Dset           AS D
                  JOIN Dset_Data_Asgn AS A ON D.Dset_ID = A.Dset_ID
                  JOIN Blz_Tbl        AS B ON A.Inst_ID = B.Inst_ID
                 GROUP BY D.Set_ID
               ) AS M ON D.Set_ID = M.Set_ID AND B.Price < M.Price
         GROUP BY D.Set_ID
        ) AS X ON X.Set_ID = S.Set_ID AND X.Price = B.Price
 ORDER BY S.SetName;

Final Output

setname node_name menuitem price
set1    US        m2        2.98
set2    Chicago   m1        2.00

Summary

The key point in the development was to build and test the queries step-by-step. The hardest step was step 3. It takes some practice to be able to devise a query like that, but after a few years (maybe twenty or so) it becomes second nature. But the step-wise refinement — or test-driven development — of complex queries is a necessity. You test each step as shown, to make sure the answer is what you expect. If it isn't correct, you modify the current query, or you go back and modify prior queries if you realize that they were not precisely what you needed after all.

I really did create the final query in those separate stages. I wouldn't consider doing it otherwise. You can search for TDQD (optionally in tag [sql]) and you'll see other examples of developing complex queries step-by-step.

SQL to create and load the schema

CREATE TABLE Node
(
    Node_ID     INTEGER NOT NULL PRIMARY KEY,
    Node_Name   CHAR(7) NOT NULL
);

CREATE TABLE Sets
(
    Set_ID  INTEGER NOT NULL PRIMARY KEY,
    SetName CHAR(4) NOT NULL UNIQUE,
    Mkt_ID  INTEGER NOT NULL,
    Node_ID INTEGER NOT NULL REFERENCES Node(Node_ID)
);

CREATE TABLE Dset
(
    Dset_ID INTEGER NOT NULL PRIMARY KEY,
    Set_ID  INTEGER NOT NULL REFERENCES Sets(Set_ID),
    Dltd_Fl INTEGER NOT NULL
);

CREATE TABLE Blz_Tbl
(
    Inst_ID INTEGER NOT NULL PRIMARY KEY,
    Price   DECIMAL(5,2) NOT NULL
);

CREATE TABLE Dset_Data_Asgn
(
    Dset_ID INTEGER NOT NULL REFERENCES Dset(Dset_ID),
    Inst_ID INTEGER NOT NULL REFERENCES Blz_Tbl(Inst_ID),
    PRIMARY KEY(Dset_ID, Inst_ID),
    Data_ID CHAR(2) NOT NULL
);

INSERT INTO Node VALUES(1, 'US');
INSERT INTO Node VALUES(2, 'Chicago');
INSERT INTO Node VALUES(3, 'Florida');

INSERT INTO Sets VALUES(1, 'set1', 1, 1);
INSERT INTO Sets VALUES(2, 'set2', 1, 2);

INSERT INTO Dset VALUES(101, 1, 0);
INSERT INTO Dset VALUES(102, 1, 0);
INSERT INTO Dset VALUES(103, 1, 0);
INSERT INTO Dset VALUES(104, 1, 0);
INSERT INTO Dset VALUES(105, 2, 0);
INSERT INTO Dset VALUES(106, 2, 0);

INSERT INTO Blz_Tbl VALUES(301, 2.00);
INSERT INTO Blz_Tbl VALUES(302, 2.15);
INSERT INTO Blz_Tbl VALUES(303, 3.00);
INSERT INTO Blz_Tbl VALUES(304, 2.25);
INSERT INTO Blz_Tbl VALUES(305, 2.50);
INSERT INTO Blz_Tbl VALUES(306, 2.85);
INSERT INTO Blz_Tbl VALUES(307, 2.98);
INSERT INTO Blz_Tbl VALUES(308, 1.00);
INSERT INTO Blz_Tbl VALUES(309, 2.00);
INSERT INTO Blz_Tbl VALUES(310, 3.00);

INSERT INTO Dset_Data_Asgn VALUES(101, 301, 'm1');
INSERT INTO Dset_Data_Asgn VALUES(101, 302, 'm2');
INSERT INTO Dset_Data_Asgn VALUES(101, 303, 'm3');
INSERT INTO Dset_Data_Asgn VALUES(102, 304, 'm1');
INSERT INTO Dset_Data_Asgn VALUES(103, 305, 'm1');
INSERT INTO Dset_Data_Asgn VALUES(104, 306, 'm1');
INSERT INTO Dset_Data_Asgn VALUES(104, 307, 'm2');
INSERT INTO Dset_Data_Asgn VALUES(105, 308, 'm1');
INSERT INTO Dset_Data_Asgn VALUES(105, 309, 'm1');
INSERT INTO Dset_Data_Asgn VALUES(106, 310, 'm2');

It would have been nice if the schema and data had been provided by the person asking the question instead of having to be written by the person answering it!

Otros consejos

If your version of DB2 supports windowing functions, you may be able to greatly simplify @Jonathan's answer:

SELECT Sets.setName, Node.node_name, Ordered.data_id as menuItem, Ordered.price
FROM (SELECT DSet.set_id, DSet_Data_Asgn.data_id, Blz_Tbl.price, 
             ROW_NUMBER() OVER(PARTITION BY DSet.set_id ORDER BY Blz_Tbl.price DESC) as rn
      FROM DSet
      JOIN DSet_Data_Asgn
        ON DSet_Data_Asgn.DSet_id = DSet.DSet_id
      JOIN Blz_Tbl
        ON Blz_Tbl.Inst_id = DSet_Data_Asgn.Inst_id) Ordered
JOIN Sets
  ON Sets.set_id = Ordered.set_id
JOIN Node
  ON Node.node_id = Sets.node_id
WHERE Ordered.rn = 2
ORDER BY Node.node_name DESC

(Have an SQL Fiddle example; it's using SQL Server, but the syntax is the same).

And results are:

setname node_name menuitem price
set1    US        m2        2.98
set2    Chicago   m1        2

(Thanks for the schema setup scripts Jonathan - that made my life much easier.)

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top