SQL Server, table-valued functions slow processing
-
21-03-2021 - |
Question
I tried everything but I couldn't overcome this problem.
I have a table-valued function.
When I call this function with
SELECT * FROM Ratings o1
CROSS APPLY dbo.FN_RatingSimilarity(50, 497664, 'Cosine') o2
WHERE o1.trackId = 497664
It takes a while to be executed. But when I do this.
SELECT * FROM Ratings o1
CROSS APPLY dbo.FN_RatingSimilarity(50, o1.trackId, 'Cosine') o2
WHERE o1.trackId = 497664
It is executed in 32 seconds. I created all indexes but It didn't help.
My function by the way:
ALTER FUNCTION [dbo].[FN_RatingSimilarity]
(
@trackId INT,
@nTrackId INT,
@measureType VARCHAR(100)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
SELECT o2.id,
o2.name,
o2.releaseDate,
o2.numberOfRatings,
o2.averageRating,
COUNT(1) as numberOfSharedUsers,
CASE @measureType
WHEN 'Cosine' THEN SUM(o3.score*o4.score)/(0.01+SQRT(SUM(POWER(o3.score,2))) * SQRT(SUM(POWER(o4.score,2))))
WHEN 'AdjustedCosine' THEN SUM((o3.score-o5.averageRating)*(o4.score-o5.averageRating))/(0.01+SQRT(SUM(POWER(o3.score-o5.averageRating, 2)))*SQRT(SUM(POWER(o4.score-o5.averageRating, 2))))
WHEN 'Pearson' THEN SUM((o3.score-o1.averageRating)*(o4.score-o2.averageRating))/(0.01+SQRT(SUM(POWER(o3.score-o1.averageRating, 2)))*SQRT(SUM(POWER(o4.score-o2.averageRating, 2))))
END as similarityRatio
FROM dbo.Tracks o1
INNER JOIN dbo.Tracks o2 ON o2.id != @trackId
INNER JOIN dbo.Ratings o3 ON o3.trackId = o1.id
INNER JOIN dbo.Ratings o4 ON o4.trackId = o2.id AND o4.userId = o3.userId
INNER JOIN dbo.Users o5 ON o5.id = o4.userId
WHERE o1.id = @trackId
AND o2.id = ISNULL(@nTrackId, o2.id)
GROUP BY o2.id,
o2.name,
o2.releaseDate,
o2.numberOfRatings,
o2.averageRating
)
Any help would be appreciated.
Thanks. Emrah
La solution
I believe that your bottleneck is the calculations + your very expensive inner joins.
The way your are joining is basically creating a cross join - It is returning a result set with all ther records linked to all other records, Except the one for which the id is supplied. Then you go and add to that result set with the other inner joins.
For every inner join, SQL goes and creates a result set with all the rows matching. So the first thing you do in your query is to tell SQL to basically do a cross join on the same table. (I am assuming you are still following, that looks pretty advanced so I'll just take you are familiar with advanced SQL syntax and operators)
Now in the next inner join, you are applying the Results table to your newly created huge result set, and only then filtering out the ones not both tables.
So as a start, see if you can't do your joins the other way around. (This really depends on your table record count and record sizes). Try to have the smallest result sets first and then join onto that.
The second thing you might want to try is to firstly limit your result set even before the joins.So start with a CTE where you filter for o1.id = @trackId. Then select * from this CTE , do your joins on the CTE and then filter in your query for o2.id = ISNULL(@nTrackId, o2.id)
I will work on an example, stay tuned...
-- Ok, I added an example, did a quick test and the values returned are the same. Run this through your data and let us know if there is any improvement. (Note, this does not address the INNER JOIN order point discussed, still do play around with that.)
Example:
ALTER FUNCTION [dbo].[FN_RatingSimilarity_NEW]
(
@trackId INT,
@nTrackId INT,
@measureType VARCHAR(100)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
WITH CTE_ALL AS
(
SELECT id,
name,
releaseDate,
numberOfRatings,
averageRating
FROM dbo.Tracks
WHERE id = @trackId
)
SELECT o2.id,
o2.name,
o2.releaseDate,
o2.numberOfRatings,
o2.averageRating,
COUNT(1) as numberOfSharedUsers,
CASE @measureType
WHEN 'Cosine' THEN SUM(o3.score*o4.score)/(0.01+SQRT(SUM(POWER(o3.score,2))) * SQRT(SUM(POWER(o4.score,2))))
WHEN 'AdjustedCosine' THEN SUM((o3.score-o5.averageRating)*(o4.score-o5.averageRating))/(0.01+SQRT(SUM(POWER(o3.score-o5.averageRating, 2)))*SQRT(SUM(POWER(o4.score-o5.averageRating, 2))))
WHEN 'Pearson' THEN SUM((o3.score-o1.averageRating)*(o4.score-o2.averageRating))/(0.01+SQRT(SUM(POWER(o3.score-o1.averageRating, 2)))*SQRT(SUM(POWER(o4.score-o2.averageRating, 2))))
END as similarityRatio
FROM CTE_ALL o1
INNER JOIN dbo.Tracks o2 ON o2.id != @trackId
INNER JOIN dbo.Ratings o3 ON o3.trackId = o1.id
INNER JOIN dbo.Ratings o4 ON o4.trackId = o2.id AND o4.userId = o3.userId
INNER JOIN dbo.Users o5 ON o5.id = o4.userId
WHERE o2.id = ISNULL(@nTrackId, o2.id)
GROUP BY o2.id,
o2.name,
o2.releaseDate,
o2.numberOfRatings,
o2.averageRating
)