Select newest entry from a joined MySQL table

https://stackoverflow.com/questions/1918827

20-09-2019
|

Question

I have stock quantity information in my database.
1 table, "stock", holds the productid (sku) along with the quantity and the filename from where it came.

The other table, "stockfile", contains all the processed filenames along with dates.

Now I need to get all the products with their latest stock quantity values.

This gives me ALL the products multiple times with all their stock quantity (resulting in 300.000 records)

SELECT stock.stockid, stock.sku, stock.quantity, stockfile.filename, stockfile.date
FROM stock
INNER JOIN stockfile ON stock.stockfileid = stockfile.stockfileid
ORDER BY stock.sku ASC

I already tried this:

SELECT * FROM stock
INNER JOIN stockfile ON stock.stockfileid = stockfile.stockfileid
GROUP BY sku
HAVING stockfile.date = MAX( stockfile.date )
ORDER BY stock.sku ASC

But it did not work

SHOW CREATE TABLE stock:

CREATE TABLE stock (
stockid bigint(20) NOT NULL AUTO_INCREMENT,
sku char(25) NOT NULL,
quantity int(5) NOT NULL,
creationdate datetime NOT NULL,
stockfileid smallint(5) unsigned NOT NULL,
touchdate datetime NOT NULL,
PRIMARY KEY (stockid)
) ENGINE=MyISAM AUTO_INCREMENT=315169 DEFAULT CHARSET=latin1

SHOW CREATE TABLE stockfile:

CREATE TABLE stockfile (
stockfileid smallint(5) unsigned NOT NULL AUTO_INCREMENT,
filename varchar(25) NOT NULL,
creationdate datetime DEFAULT NULL,
touchdate datetime DEFAULT NULL,
date datetime DEFAULT NULL,
begindate datetime DEFAULT NULL,
enddate datetime DEFAULT NULL,
PRIMARY KEY (stockfileid)
) ENGINE=MyISAM AUTO_INCREMENT=265 DEFAULT CHARSET=latin1

Solution

This is an example of the frequently-asked "greatest-n-per-group" question that we see every week on StackOverflow. Follow that tag to see other similar solutions.

SELECT s.*, f1.*
FROM stock s
INNER JOIN stockfile f1
  ON (s.stockfileid = f1.stockfileid)
LEFT OUTER JOIN stockfile f2
  ON (s.stockfileid = f2.stockfileid AND f1.date < f2.date)
WHERE f2.stockfileid IS NULL;

If there are multiple rows in stockfile that have the max date, you'll get them both in the result set. To resolve this, you'd have to add some tie-breaker conditions into the join on f2.

Thanks for adding the CREATE TABLE info. That's very helpful when you're asking SQL questions.

I see from the AUTO_INCREMENT table options that you have 315k rows in stock and only 265 rows in stockfile. Your stockfile table is the parent in the relationship, and the stock table is the child, with a column stockfileid that references the primary key of stockfile.

So your original question was misleading. You want the latest row from stock, not the latest row from stockfile.

SELECT f.*, s1.*
FROM stockfile f
INNER JOIN stock s1
  ON (f.stockfileid = s1.stockfileid)
LEFT OUTER JOIN stock s2
  ON (f.stockfileid = s2.stockfileid AND (s1.touchdate < s2.touchdate
      OR s1.touchdate = s2.touchdate AND s1.stockid < s2.stockid))
WHERE s2.stockid IS NULL;

I'm assuming you want "latest" to be relative to touchdate, so if you want to use creationdate instead, you can do the edit.

I've added a term to the join so that it resolves ties. I know you said the dates are "practically unique" but as the saying goes, "one in a million is next Tuesday."

Okay, I think I understand what you're trying to do now. You want the most recent row per sku, but the date by which to compare them is in the referenced table stockfile.

SELECT s1.*, f1.*
FROM stock s1
JOIN stockfile f1 ON (s1.stockfileid = f1.stockfileid)
LEFT OUTER JOIN (stock s2 JOIN stockfile f2 ON (s2.stockfileid = f2.stockfileid))
  ON (s1.sku = s2.sku AND (f1.date < f2.date OR f1.date = f2.date AND f1.stockfileid < f2.stockfileid))
WHERE s2.sku IS NULL;

This does a self-join of stock to itself, looking for a row with the same sku and a more recent date. When none is found, then s1 contains the most recent row for its sku. And each instance of stock has to join to its stockfile to get the date.

Re comment about optimization: It's hard for me to test because I don't have tables populated with data matching yours, but I'd guess you should have the following indexes:

CREATE INDEX stock_sku ON stock(sku);
CREATE INDEX stock_stockfileid ON stock(stockfileid);
CREATE INDEX stockfile_date ON stockfile(date);

I'd suggest using EXPLAIN to analyze the query without the indexes, and then create one index at a time and re-analyze with EXPLAIN to see which one gives the most direct benefit.

OTHER TIPS

Use:

SELECT DISTINCT s.stockid, 
       s.sku, 
       s.quantity, 
       sf.filename, 
       sf.date
  FROM STOCK s
  JOIN STOCKFILE sf ON sf.stockfileid = s.stockfileid
  JOIN (SELECT t.stockfileid,
               MAX(t.date) 'max_date'
          FROM STOCKFILE t
      GROUP BY t.stockfileid) x ON x.stockfileid = sf.stockfileid
                               AND x.max_date = sf.date

select *
from   stock
where  stockfileid in (
           select top 1 stockfileid
           from   stockfile
           order by date desc
       )

There are two common ways to accomplish this: a sub query or a self-join.

See this example of selecting the group-wise maximum at the MySQL site.

Edit, an example using a subquery:

SELECT stock.stockid, stock.sku, stock.quantity,
       stockfile.filename, stockfile.date
FROM stock
INNER JOIN stockfile ON stock.stockfileid = stockfile.stockfileid
WHERE stockfile.date = (SELECT MAX(date) FROM stockfile);

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow