Question

Please take a look at this SQL Fiddle. I have an author table, and a book table, you know how that works:

create table author (
  id int auto_increment,
  first_name varchar(50),
  last_name varchar(50),
  birthdate date,
  death date null,
  birthplace varchar(50),
  number_of_children int,
  mother_name varchar(100),
  other_meaningless_stuff text,
  PRIMARY KEY (id)
);

create table book (
  id int auto_increment,
  author_id int,
  title varchar(100),
  release_date date,
  PRIMARY KEY (id)
);

In plain English, I want to obtain all the fields in author, the number of books released after the author's death, and the last book released after his death. In MySQL, I could:

select author.*,
  (select count(*) from book where book.author_id=author.id and book.release_date > author.death) post_mortem_books,
  (select max(release_date) from book where book.author_id=author.id and book.release_date > author.death) last_post_mortem_book
from author;

As you can see, both subqueries have exactly the same WHERE clause - but that seems a waste, since both table are very large. Another solution is to GROUP BY every field in the query, but that can be vwey slow. How can I get that result in a more optmized way?

Was it helpful?

Solution

Not with a nested subquery in the select clause. So, switch the query to use a join and group by:

select a.*,
       sum(case when b.release_date > a.death then 1 else 0 end) as post_mortem_books,
       max(case when b.release_date > a.death then release_date end) as last_post_mortem_book
from author a left outer join
     books b
     on a.author_id = b.author_id
group by a.author_id
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top