Question

I tried everything but I couldn't overcome this problem.

I have a table-valued function.

When I call this function with

SELECT * FROM Ratings o1 
    CROSS APPLY dbo.FN_RatingSimilarity(50, 497664, 'Cosine') o2 
WHERE o1.trackId = 497664

It takes a while to be executed. But when I do this.

SELECT * FROM Ratings o1 
    CROSS APPLY dbo.FN_RatingSimilarity(50, o1.trackId, 'Cosine') o2 
WHERE o1.trackId = 497664

It is executed in 32 seconds. I created all indexes but It didn't help.

My function by the way:

ALTER FUNCTION [dbo].[FN_RatingSimilarity]
(   
    @trackId    INT,
    @nTrackId   INT,
    @measureType    VARCHAR(100)
)
RETURNS TABLE 
WITH SCHEMABINDING
AS
    RETURN
    (
        SELECT o2.id,
               o2.name,
               o2.releaseDate,
               o2.numberOfRatings,
               o2.averageRating,
               COUNT(1) as numberOfSharedUsers,
          CASE @measureType 
               WHEN 'Cosine' THEN SUM(o3.score*o4.score)/(0.01+SQRT(SUM(POWER(o3.score,2))) * SQRT(SUM(POWER(o4.score,2)))) 
               WHEN 'AdjustedCosine' THEN SUM((o3.score-o5.averageRating)*(o4.score-o5.averageRating))/(0.01+SQRT(SUM(POWER(o3.score-o5.averageRating, 2)))*SQRT(SUM(POWER(o4.score-o5.averageRating, 2)))) 
               WHEN 'Pearson' THEN SUM((o3.score-o1.averageRating)*(o4.score-o2.averageRating))/(0.01+SQRT(SUM(POWER(o3.score-o1.averageRating, 2)))*SQRT(SUM(POWER(o4.score-o2.averageRating, 2)))) 
           END as similarityRatio
          FROM dbo.Tracks o1
    INNER JOIN dbo.Tracks o2 ON o2.id != @trackId 
    INNER JOIN dbo.Ratings o3 ON o3.trackId = o1.id 
    INNER JOIN dbo.Ratings o4 ON o4.trackId = o2.id AND o4.userId = o3.userId
    INNER JOIN dbo.Users o5 ON o5.id = o4.userId 
         WHERE o1.id = @trackId 
             AND o2.id = ISNULL(@nTrackId, o2.id)
      GROUP BY o2.id, 
               o2.name, 
               o2.releaseDate,
               o2.numberOfRatings, 
               o2.averageRating
    )

Any help would be appreciated.

Thanks. Emrah

Was it helpful?

Solution

I believe that your bottleneck is the calculations + your very expensive inner joins.

The way your are joining is basically creating a cross join - It is returning a result set with all ther records linked to all other records, Except the one for which the id is supplied. Then you go and add to that result set with the other inner joins.

For every inner join, SQL goes and creates a result set with all the rows matching. So the first thing you do in your query is to tell SQL to basically do a cross join on the same table. (I am assuming you are still following, that looks pretty advanced so I'll just take you are familiar with advanced SQL syntax and operators)

Now in the next inner join, you are applying the Results table to your newly created huge result set, and only then filtering out the ones not both tables.

So as a start, see if you can't do your joins the other way around. (This really depends on your table record count and record sizes). Try to have the smallest result sets first and then join onto that.

The second thing you might want to try is to firstly limit your result set even before the joins.So start with a CTE where you filter for o1.id = @trackId. Then select * from this CTE , do your joins on the CTE and then filter in your query for o2.id = ISNULL(@nTrackId, o2.id)

I will work on an example, stay tuned...

-- Ok, I added an example, did a quick test and the values returned are the same. Run this through your data and let us know if there is any improvement. (Note, this does not address the INNER JOIN order point discussed, still do play around with that.)

Example:

ALTER FUNCTION [dbo].[FN_RatingSimilarity_NEW] 
(    
    @trackId    INT, 
    @nTrackId   INT, 
    @measureType    VARCHAR(100) 
) 
RETURNS TABLE  
WITH SCHEMABINDING 
AS 
    RETURN 
    ( 
        WITH CTE_ALL AS 
        (
            SELECT id, 
               name, 
               releaseDate, 
               numberOfRatings, 
               averageRating
            FROM dbo.Tracks
            WHERE  id = @trackId  
        )
        SELECT o2.id, 
               o2.name, 
               o2.releaseDate, 
               o2.numberOfRatings, 
               o2.averageRating, 
               COUNT(1) as numberOfSharedUsers, 
          CASE @measureType  
               WHEN 'Cosine' THEN SUM(o3.score*o4.score)/(0.01+SQRT(SUM(POWER(o3.score,2))) * SQRT(SUM(POWER(o4.score,2))))  
               WHEN 'AdjustedCosine' THEN SUM((o3.score-o5.averageRating)*(o4.score-o5.averageRating))/(0.01+SQRT(SUM(POWER(o3.score-o5.averageRating, 2)))*SQRT(SUM(POWER(o4.score-o5.averageRating, 2))))  
               WHEN 'Pearson' THEN SUM((o3.score-o1.averageRating)*(o4.score-o2.averageRating))/(0.01+SQRT(SUM(POWER(o3.score-o1.averageRating, 2)))*SQRT(SUM(POWER(o4.score-o2.averageRating, 2))))  
           END as similarityRatio 
          FROM CTE_ALL o1 
    INNER JOIN dbo.Tracks o2 ON o2.id != @trackId  
    INNER JOIN dbo.Ratings o3 ON o3.trackId = o1.id  
    INNER JOIN dbo.Ratings o4 ON o4.trackId = o2.id AND o4.userId = o3.userId 
    INNER JOIN dbo.Users o5 ON o5.id = o4.userId  
         WHERE o2.id = ISNULL(@nTrackId, o2.id) 
      GROUP BY o2.id,  
               o2.name,  
               o2.releaseDate, 
               o2.numberOfRatings,  
               o2.averageRating 
    ) 
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top