SQLServer: Why avoid Table-Valued User Defined Functions?

https://stackoverflow.com/questions/1081057

22-08-2019
|

Question

I have a rather huge query that is needed in several stored procedures, and I'd like to shift it into a UDF to make it easier to maintain (A view won't work, this takes in a bunch of parameters), however everyone I've ever talked to has told me that UDF's are incredibly slow.

While I don't know what exactly makes them slow, I'm will to guess that they are, but seeing as I'm not using this UDF within a join, but instead to return a table variable, I think it wouldn't be that bad.

So I guess the question is, should I avoid UDFs at all cost? Can anyone point to concrete evidence stating that they are slower?

Solution

Scalar UDFs are very slow, inline UDFs are in fact macros, as such they are very fast: A few articles:

Reuse Your Code with Table-Valued UDFs

Many nested inline UDFs are very fast

More links on slowness of scalar UDFs:

SQL Server Performance patterns of a UDF with datetime parameters

Not all UDFs are bad for performance

OTHER TIPS

as you pointed out that the results of the (table) udf will not be joined to anything then there shoud not be any impact on performance.

To try to explain a little about why UDFs can be perceived as slow (in fact just used in the wrong way) consider the following exmaple;

We have table A and Table B. Say we had a join like

SELECT A.col1, A.col2, B.ColWhatever FROM A JOIN B ON A.aid = b.fk_aid WHERE B.someCol = @param1 AND A.anotherCol = @param2

In this case, SQL Server will do it's best to return the results in the most performant way it knows how. A major factor in this is reducing the disk reads. So - it will use the conditions in the JOIN and where clause to evaluate (hopefully with an index) how many rows to return.

Now - say we extract some part of the conditions used to restirct the amount of data returned to a UDF. Now - the query optimizer can no longer pull back the minimum amount of rows from the disk, it can only deal with the conditions it provides. In a nutshell - a table udf is always evaluated and the data is returned before being returned to the main sproc, so, if there were some other criteria in the original join that could have caused fewer disk reads - this will only be applied to data after being pulled into the sproc.

So say we create a UDF to select the rows from table B that match the where clause. If there are 100k rows in table B and 50% of them meet the criteria of the where clause - then all these rows will be returned to the sproc for comparison with table A. Now if only 10% of those have matches in table A now we are only talking 5% of table B that we want to work with, but we have already pulled back 50%, the majority of which we do not want!

If this comes across as complete gibberish apologies - please let me know!

Could you post your code? Generally speaking if you are using a scalar udf in the select clause of a query, the statements within the udf will be executed once per row returned from the query. It would be better to perform a join to a table valued udf, or find some way to perform the logic within your udf using a join in your main SQL statement.

Is there some reason you don't want to use a stored procedure instead of a UDF?

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow