Question

In my current setup, I have two tables: product and rating.

Product Table

  • product_id
  • rating

The product table contains a whole bunch additional of information, but for this question, I am focussed on those two fields only.

Rating Table

  • product_id
  • rating
  • user_id (who rated)
  • is_admin - bool on whether the user that rated was an admin

The reason we collect the admin ratings in the first place, is because we want to weigh admin ratings slightly higher (60%) in comparison to regular users (40%). The rating column in the product table is equal to the AVG of all the admin ratings. Ratings in general are between 1 and 5.

So for each product we have to consider four scenarios:

RATINGS BY     TOTAL
USER  ADMIN    RATING
----  -----
 no    no   = 0
 yes   no   = AVG of user ratings (`ratings` table) 
 yes   yes  = 0.6 AVG of admin ratings (`product_table`) + 0.4 AVG of user ratings (`ratings` table) 
 no    yes  = AVG of admin ratings (`product_table`)

The SQL query which currently retrieves the datasets looks like this:

$sql = "SELECT p.product_id, 
(COALESCE(p.rating,0)+COALESCE(j.sum,0)) / (COALESCE(p.rating/p.rating,0) 
   + COALESCE(j.tot,0)) AS rating  
FROM product p  
LEFT JOIN   
   (SELECT SUM(rating) AS sum , 
      COUNT(rating) AS tot, 
      product_id FROM rating 
   WHERE is_admin_rating=FALSE GROUP BY product_id) j 
ON (p.product_id = j.product_id) LEFT JOIN product_description pd 
   ON (p.product_id = pd.product_id) LEFT JOIN product_to_store p2s 
   ON (p.product_id = p2s.product_id)";

This query then gets appended with a variety of different sort options (rating being the default), in addition to that we also use LIMIT to "paginate" the search results.

Is there a way in to incorporate the weighted ratings into the query? Or will I have to break it up into several queries?

Was it helpful?

Solution

Since this obviously looks like a web-based system, I would strongly suggest a slight denormalization and tacking on 5 columns to the product table for

UserRatings, UserCount, AdminRatings, AdminCount, FinalRating

When any entries are added or updated to the ratings table, you could apply a simple update trigger, something like

update Product p,
       ( select r.product_id,
                sum( is_admin_rating=FALSE, 1, 0 ) as UserCount,
                sum( is_admin_rating=FALSE, rating, 0 ) as UserRatings,
                sum( is_admin_rating=TRUE, 1, 0 ) as AdminCount,
                sum( is_admin_rating=TRUE, rating, 0 ) as AdminRatings
            from Ratings r
            where r.product_id = ProductIDThatCausedThisTrigger
            group by r.product_id ) as PreSum
   set p.UserCount = PreSum.UserCount,
       p.UserRatings = PreSum.UserRatings,
       p.AdminrCount = PreSum.AdminCount,
       p.AdminRatings = PreSum.AdminRatings,
       p.FinalRating = case when PreSum.UserCount = 0 and PreSum.AdminCount = 0
                               then 0
                            when PreSum.UserCount = 0 
                               then PreSum.AdminRatings / PreSum.AdminCount
                            when PreSum.AdminCount = 0 
                               then PreSum.UserRatings / PreSum.UserCount
                            else
                               ( PreSum.UserRatings / PreSum.UserCount * .4 )
                              / ( PreSum.AdminRatings / PreSum.AdminCount * .6 )
                            end
   where p.product_id = PreSum.product_id

This way, you will never have to do a separate join to the ratings table and do aggregations which will just get slower as more data is accumulated. Then your query can just use the tables and not have to worry about coalesce, your count per each and their ratings will be there.

The case/when for the FinalRatings is basically doing it all wrapped up as the combination of the user counts and admin counts can be 0/0, +/0, 0/+ or +/+

So, if no count for either, the case/when sets rating to 0
if only the user count has a value, just get that average rating (userRatings / userCounts)
if only the admin count has a value, get admin avg rating (adminRatings / adminCounts)
if BOTH have counts, you are taking the respective averages * .4 and * .6 respectively.  This would be the one factoring adjustment you might want to  tweak.

Although the query itself looks somewhat monstrous and confusing, if you look at the "PreSum" query, you are only doing it for the 1 product that has just been rated and basis from the trigger. Then, a simple update based on the results of that joined by the single product ID.

Getting this to work might offer a better long-term solution for you.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top