Question

In SQL Server, I have a CLR integration based table valued function, GetArchiveImages. I call it something like this:

SELECT ...
FROM Items
CROSS APPLY GetArchiveImages(Items.ID) AS archiveimages
WHERE ...

The issue is that there is overhead for each individual call to the function.

If it could be joined with the whole table at once, the overhead would be quite minor, but since it's called once for each row, that overhead scales with the number of rows.

I don't use a stored procedure, because a table returned by a stored procedure can't be joined with anything (as far as I know).

Is there an efficient way to join tables with the results of a stored procedure or function in bulk, instead of row by row?

Was it helpful?

Solution

As the result of GetArchiveImages depends on the Items.ID SQL Server has to call the function for each item, otherwise you wont get correct results.

The only function that SQL Server can "break up" is a T-SQL Inline Table Valued Function. So if you can rewrite your CLR as a ITVF, you will get better performance.

In my experience, the overhad of calling a CLR function however is not that big. It is much more likely that you are having problems somewhere else in the query. For example, SQL Server has no idea how many rows will be returned by that function and just assumes it will be one (for each call). That can lead to misinformed decisions in other places during the optimization process.


UPDATE:

SQL Server does not allow to keep static non-constant data within a CLR class. There are ways to trick the system, e.g. by creating a static final collection object (you can add and remove items from a static collection), however, I would advise against that for stability reasons.

In you case It might make sense to create a cache table that is refreshed either automatically with some sort of (database- or file-system-) trigger or on a schedule. Instead of calling the function you can then just join with that table.

OTHER TIPS

If the GetArchiveImages() function does not need to be used in multiple queries, or at least not used outside of similar queries, you can switch the Outer and Inner aspects of this: Do the main SELECT fields FROM [Items] WHERE ... in the SQLCLR TVF. And make it a streaming TVF.

The basic structure needed would be:

  1. Define a variable of type SqlDataRecord to be all of the fields you want to return from [Items] plus the others being returned by the current GetArchiveImages() function.

  2. Read the "several files in the file system" (taken from the first comment on @Sebastian Meine's answer)

  3. Open a SqlConnection using "Trusted_Connection = true; Enlist = false;" as the ConnectionString.

  4. Execute the main SELECT fields FROM [Items] {optional WHERE}. If it is possible at this point to narrow down some of the rows, then fill out the WHERE. You can even pass in values to the function to pass along to the WHERE clause.

  5. Loop through the SqlDataRecord:

    1. Fill out the SqlDataRecord variable for this row
    2. Get related items that the current GetArchiveImages() function is getting based on [Items].[ItemID]
    3. call yield return;
  6. Close the SqlConnection

  7. Dispose of the SqlDataReader, SqlCommand, and SqlConnection.

  8. Close any files opened in Step 2 (if they can't be closed earlier in the process).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top