Question

The question from quite a long time boiling in my head, that out of the following two stored procedures which one would perform better.

Proc 1

CREATE PROCEDURE GetEmployeeDetails @EmployeeId uniqueidentifier,
@IncludeDepartmentInfo bit

AS
BEGIN

    SELECT * FROM Employees 
    WHERE Employees.EmployeeId = @EmployeeId

    IF (@IncludeDepartmentInfo = 1)
    BEGIN
        SELECT Departments.* FROM Departments, Employees
        WHERE Departments.DepartmentId = Employees.DepartmentId 
        AND Employees.EmployeeId = @EmployeeId
    END
END

Proc 2

CREATE PROCEDURE GetEmployeeDetails @EmployeeId uniqueidentifier,
 @IncludeDepartmentInfo bit
AS
BEGIN

    SELECT * FROM Employees 
    WHERE Employees.EmployeeId = @EmployeeId

    SELECT Departments.* FROM Departments, Employees
    WHERE Departments.DepartmentId = Employees.DepartmentId 
    AND Employees.EmployeeId = @EmployeeId 
    AND @IncludeDepartmentInfo = 1

END

the only difference between the two is use of 'if statment'.

if proc 1/proc 2 are called with alternating values of @IncludeDepartmentInfo then from my understanding proc 2 would perform better, because it will retain the same query plan irrespective of the value of @IncludeDepartmentInfo, whereas proc1 will change query plan in each call

answers are really appericated

PS: this is just a scenario, please don't go to the explicit query results but the essence of example. I am really particular about the query optimizer result (in both cases of 'if and where' and their difference), there are many aspects which I know could affect the performance which I want to avoid in this question.

Was it helpful?

Solution

SELECT Departments.* FROM Departments, Employees
    WHERE Departments.DepartmentId = Employees.DepartmentId 
    AND Employees.EmployeeId = @EmployeeId 
    AND @IncludeDepartmentInfo = 1

When SQL compiles a query like this it must be compiled for any value of @IncludeDepartmentInfo. The resulted plan can well be one that scans the tables and performs the join and after that checks the variable, resulting in unnecessary I/O. The optimizer may be smart and move the check for the variable ahead of the actual I/O operations in the execution plan, but this is never guaranteed. This is why I always recommend to use explicit IFs in the T-SQL for queries that need to perform very differently based on a variable value (the typical example being OR conditions).

gbn's observation is also an important one: from an API design point of view is better to have a consistent return type (ie. always return the same shaped and number of result sets).

OTHER TIPS

From a consistency perspective, number 2 will always return 2 datasets. Overloading aside, you wouldn't have a client code method that may be returns a result, maybe not.

If you reuse this code, the other calling client will have to know this flag too.

If the code does 2 different things, then why not 2 different stored procs?

Finally, it's far better practice to use modern JOIN syntax and separate joining from filtering. In this case, personally I'd use EXISTS too.

SELECT
    D.*
FROM
    Departments D
    JOIN
    Employees E ON D.DepartmentId = E.DepartmentId
WHERE  
    E.EmployeeId = @EmployeeId 
    AND
    @IncludeDepartmentInfo = 1

When you use the 'if' statement, you may run only one query instead of two. I would think that one query would almost always be faster than two. Your point about query plans may be valid if the first query were complex and took a long time to run, and the second was trivial. However, the first query looks like it retrieves a single row based on a primary key - probably pretty fast every time. So, I would keep the 'if' - but I would test to verify.

The performance difference would be too small for anyone to notice.

Premature optimization is the root of all evil. Stop worrying about performance and start implementing features that make your customers smile.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top