سؤال

After searching and reading a little bit I came up with the following SQL query for my application:

SELECT
  ROUND(AVG(CASE WHEN gender = 'M' THEN rating END), 1) avgAllM,
  COUNT(CASE WHEN gender = 'M' THEN rating END) countAllM,
  ROUND(AVG(CASE WHEN gender = 'F' THEN rating END), 1) avgAllF,
  COUNT(CASE WHEN gender = 'F' THEN rating END) countAllF,
  ROUND(AVG(CASE WHEN gender = 'M' AND UserAge(birth_date) <= 18 THEN rating END), 1) avgU18M,
  COUNT(CASE WHEN gender = 'M' AND UserAge(birth_date) <= 18 THEN rating END) countU18M,
  ROUND(AVG(CASE WHEN gender = 'F' AND UserAge(birth_date) <= 18 THEN rating END), 1) avgU18F,
  COUNT(CASE WHEN gender = 'F' AND UserAge(birth_date) <= 18 THEN rating END) countU18F
FROM movie_ratings mr INNER JOIN accounts a
  ON mr.aid = a.aid
WHERE mid = 5;

And I'm wondering how can I simplify this, if possible. The birth_date field is of type DATE and UserAge is a function to calculate the age from that date field.

The table structures are as follows:

[ACCOUNTS]
aid(PK), birth_date, gender

[MOVIE_RATINGS]
mid(PK), aid(PK,FK), rating

I'm looking for two things:

  • General simplifications to the code above that more experienced users know about that I don't.
  • I'm doing this in PHP and for each record I'll have an associative array with all those variables. I'm looking for a way to group them into a multidimensional array, so the PHP code is easier to read. Of course I don't want to do this in PHP itself, it would be pointless.

For instance, something like this:

$info[0]['avgAllM']
$info[0]['countAllM']
$info[1]['avgAllF']
$info[1]['countAllF']
$info[2]['avgU18M']
$info[2]['countU18M']
$info[3]['avgU18F']
$info[3]['countU18F']

Instead of:

$info['avgAllM']
$info['countAllM']
$info['avgAllF']
$info['countAllF']
$info['avgU18M']
$info['countU18M']
$info['avgU18F']
$info['countU18F']

I don't even know if this is possible, so I'm really wondering if it is and how it can be done.

Why I want all this? Well, the SQL query above is just a fragment of the complete SQL I need to do. I haven't done it yet because before doing all the work, I want to know if there's a more compact SQL query to achieve the same result. Basically I'll add a few more lines like the ones above but with different conditions, specially on the date.

هل كانت مفيدة؟

المحلول

You could create a VIEW with the following definition

SELECT
      CASE WHEN gender = 'M' THEN rating END AS AllM,
      CASE WHEN gender = 'F' THEN rating END AS AllF,
      CASE WHEN gender = 'M' AND UserAge(birth_date) <= 18 THEN rating END AS U18M,
      CASE WHEN gender = 'F' AND UserAge(birth_date) <= 18 THEN rating END AS U18F
      FROM movie_ratings mr INNER JOIN accounts a
        ON mr.aid = a.aid
      WHERE mid = 5

Then SELECT from that

SELECT ROUND(AVG(AllM), 1) avgAllM,
       COUNT(AllM)         countAllM,
       ROUND(AVG(AllF), 1) avg,
       COUNT(AllF)         countAllF,
       ROUND(AVG(U18M), 1) avgU18M,
       COUNT(U18M)         countU18M,
       ROUND(AVG(U18F), 1) avgU18F,
       COUNT(U18F)         countU18F
FROM  yourview

Might simplify things slightly?

نصائح أخرى

This could just be a case of optimizing too early. The query does what you need and only really looks complicated because it is. I'm not sure that there are necessarily any tricks that would help. It probably depends on the characteristics of your data. Is the query slow? Do you think it could be quicker?

It might be worth rearranging it in the following way. Since all the conditions rely on the ACCOUNTS table which I assume is going to be significantly smaller than the MOVIE_RATINGS table you might be able to do all the calculations on a smaller data set, which might be quicker. Although if you are only selecting on one movie at a time (mid = 5) then that probably won't be the case.

I'm not entirely sure that this will work but think it should.

SELECT
  ROUND(AVG(rating * AllM), 1) avgAllM,
  COUNT(rating * AllM) countAllM,
  ROUND(AVG(rating * AllF), 1) avgAllF,
  COUNT(rating * AllF) countAllF,
  ROUND(AVG(rating * AllM * U18), 1) avgU18M,
  COUNT(rating * AllM * U18) countU18M,
  ROUND(AVG(rating * AllM * U18), 1) avgU18F,
  COUNT(rating * AllM * U18) countU18F
FROM 
  movie_ratings mr 
  INNER JOIN (
    select 
      aid,
      case when gender = 'M' then 1 end as AllM,
      case when gender = 'F' then 1 end as AllF,
      case when UserAge(birth_date) <= 18 then 1 end as U18
    from accounts) a ON mr.aid = a.aid
WHERE mid = 5;

On balance though, I would probably just leave the query you have as it is. The query that you have is easy to understand and probably performs fairly well.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top