Linked Server Performance and options

https://stackoverflow.com/questions/132305

02-07-2019
|

Question

At work we have two servers, one is running an application a lot of people use which has an SQL Server 2000 back end. I have been free to query this for a long time but can't add anything to it such as stored procedures or extra tables.

This has lead to us having a second SQL Server linked to the first one and me building up a library of stored procedures that query data from both sides using linked server. Some of these queries are taking longer than what I would like.

Can someone point me to some good articles about using linked servers? I am particularly interested in finding out what data is being transferred between the two as usually the majority of the sql statement could be performed remotely but I have the feeling it may be transferring the full tables, it is usually just a join to a small final table locally.

Also what do the linked server options do I currently have:

Collation Compatible True
Data Access True
Rpc True
Rpc Out True
Use Remote Collation False
Collation Name (Blank)
Connection Timeout 0
Query Timeout 0

EDIT:

Just thought I would update this post I used openqueries with dynamic parameters for a while to boost performance, thanks for the tip. However doing this can make queries more messy as you end up dealing with strings. finally this summer we upgraded SQL Server to 2008 and implemented live data mirroring. To be honest the open queries were approaching the speed of local queries for my tasks but the mirroring has certainly made the sql easier to deal with.

Solution

I would advise dynamic openqueries in a cursor loop instead of linked joins. This is the only way i've been able to replicate MS Access' linked join performance (at least for single remote tables)

Regular linked joins in ms sql are too inefficient by pulling everything specially in humongous tables..

-- I would like to know what is so bad about openqueries inside cursor loops? if done correctly, there are no locking issues.

OTHER TIPS

Avoid joins to linked server tables.

Using a four part naming for your join can be used but is more expensive. Your join could contain criteria that can be used to limit the data set from the linked server and use the indexed columns.

Example:

SELECT loc.field1, lnk.field1
FROM MyTable loc
INNER JOIN RemoteServer.Database.Schema.SomeTable lnk
  ON loc.id = lnk.id
  AND lnk.RecordDate = GETDATE()
WHERE loc.SalesDate = GETDATE()

This query is also applying a criteria in the join that can be used by the linked server before the join is calculated.

The recommended method is the use of OPENQUERY.

By avoiding the join with the use of OPENQUERY the local server only sends the query to be executed remotely instead sending a set of IDs for the join.

Use the link to retrieve a set of data and perform the calculations locally. Either use a temporary table (for ad hoc queries) or insert the row in a permanent table in a nightly job.

Begining transactions may fail depending if the remote transaction coordinator is set in the liked server. Using it will consume more resources.

Also consider that you are hitting a production server running an application, while you do not specify it, I think is safe to assume that is using heavy transactions and doing inserts and updates. You are taking away resources away from the application.

Your purpose appears to be the use of the data for reporting purposes. Your server can be set to have a simple log instead of full making it more efficient.

You will also avoid your queries to be canceled due to data movement on the linked server. Always be mindful of setting the proper isolation level for your queries and table hints like NOLOCK.

And PLEASE! Never place an OPENQUERY (or any linked server) inside a loop!

When you use linked servers for joins like this, it is important to have the server you are immediately connected to ("local") be the one with the most of the data, where the linked server is only providing a small part of the data, otherwise, yes, it will pull as much data as it needs to perform the join.

Alternatives include copying a subset of the data across to a temporary table with as much work done to slim down the results and any pre-processing that the linked server can perform, and then do the join on the "local" side.

You may find you can easily boost performance by reversing the way you do it, connecting to the server you have no control over (they'll need to make a linked server for you) and then connecting to your server over the link. If you need to do major work with the data where you would have to create sprocs - then push the data onto your server and use your sprocs there.

In some cases, I simply had the linked server perform a nightly creation of this kind of summary which it pushed to the local server, and then the local server performed its work with the join.

Royal pain

We used to have several linked servers at our shop and it turned out to be such a PITA.

First of all, there were severe performance problems similar to what you describe. I was shocked when i saw network I/O stats. Despite all efforts, we failed to hint SQL Server into reasonable behavior.

Another problem was that stored procs had these linked server names hardcoded everywhere, with no way to override them. So developers couldn't easily test on their development sandboxes any functionality that touched linked servers. This was a major obstacle for creating a universally usable unit-test suite.

In the end we ditched linked servers completely and moved data synchronization to web-services.

Queries involving semi-joins across a linked server tend not to be very efficient. You might be better off using OPENQUERY to populate data into a local temporary table and then work on it from there.

I wrote a remote Linked Server application in SQL 2000 a couple of years ago and came across the same performance issues you describe. I ended up rewriting my stored procedures several times in order to obtain the best performance.

I used temporary tables extensively. I found that it was less expensive to retrieve large amounts of remote data into a temp table, then join to it, manipulate it, etc. Joining local to remote tables was very slow as you desribe.

Display Execution Plan and Display Estimated Execution Plan tended to help although I did not understand a lot of what I was looking at.

I don't know if there really is a efficient way to do these queries with a remote server because it seems like SQL Server cannot take advantage of its normal optimizations when going against a Linked Server. It may feel like you are transferring the entire table because in fact that is what is happening.

I am wondering if a replication scenario might work for you. By having the data on your local server, you should be able to write normal queries that will perform as desired.

I do not know of any good articles to point you towards. As I write more complicated SQL Server applications, I started to think that I needed a better understanding of how SQL Server worked underneath. To that end we bought the MS Press Inside Microsoft SQL Server 2005 series edited by Kalen Delaney here at work. Volume 1: The Storage Engine is definitely the place to start but I have not gotten that far into it. Since my last few projects have not involved SQL Server, my study of it has gotten lax.

Is there a possibility that you could set up a separate database on the server rather than using a linked server?

Its a very generous problem, which may have many solutions. But as we have witnessed so many user saying that they have tried everything.

What solved my problem is..

I upgraded sql server 2000 from sp2 to SP4 and if you already have sp4 on sql server 2000 then run Instcat.sql. As per my experience I can assure you this will work for sure, if you are exhausted with all the other workarounds.

Thanks, Mithalesh mithalesh.gupta@gmail.com

Dynamic SQL and a function can be used to get around the hard-coded name question. For instance, I'm trying an implementation where function ufn_linkedDatabase(@purpose nvarchar(255)) with input 'cpi.cpi' (purpose CPI, sub-purpose default) returns '[SERVER-NAME.DOMAIN.LCL,2000].[CPI]' in the production environment (where we use alternate port number for SQL Server, I don't know why, including in linked server name). Then a SQL command is assembled in @template varchar(max) with the expression @{cpi.cpi} representing the linked server and database, and then @workstring = REPLACE(@template, N'@{cpi.cpi}', ...) . How the function actually gets the database name is separate from the procedures - a lookup table is nice.

Issues - to do OPENQUERY(), which is probably still better at least unless the linked server option "collation compatible" is set "true" so that more of the task can be executed on the linked server - important even on a fast network, and our server room internal network is respectably fast - to do OPENQUERY() I probably need to handle 'cpi.cpi.server' and 'cpi.cpi.database' and 'cpi.cpi.server.database' separately. And, I may end up writing exactly one application using this design, in which case it's over-designed. Still, that means that the function itself doesn't have to be any kind of fancy work.

Throwing fast network hardware at the problem may be the cheaper answer, anyway.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow