Performance Tuning of SQL Stored Procedures

Am currently working on performance tuning of a complex data intensive application. Most of the business logic has been implemented in stored procedures resulting in heavily loaded stored procedures with some SPs going beyond 2000+ lines !!!

This in itself, according to me is bad design. Ideally an atomic unit of code (SP/function/method) shoud definitely not go beyond 500 lines. And moreover, it is not good to have so much of business logic implemented in SPs in database. They should have been implemented in business layer – a separate component hosted on a secure server.
However, given the situation, that the entire application has been already built and is performing poorly, and that it has been established that the poor performance is mainly due to Stored Procedures, the immediate action was to review all the SPs(approx 500) and remediate them.

Here, I have listed down some checks that help improve the performance of SQL Stored Procedures. The list can be used as a checklist of best practices to write good stored procedures and also to review stored procedures.

The sources of these tips and tricks are msdn and various helpful blogs and websites which unfortunately i havent noted.
1.      Use proper Datatype and Data Size for variables
If the variable is getting populated from database table, refer to the column data type and size and accordingly set the variable’s data type and size. Use NVARCHAR (MAX) for multilingual compliance. Using proper data types avoids extra overhead of data type conversion in SP.
2.      SET NOCOUNT ON
By default, every time an SQL statement is executed, a message is sent from the server to the client indicating the number of rows that were affected by the SQL statement. By turning off this default behaviour, one can reduce network traffic between the server and the client.
 
3.      Use fully qualified name while invoking stored procedures from application as well as from within other stored procedures
This allows SQL Server to access the stored procedures execution plan more directly, and in turn, speeding up the performance of the stored procedure.
For Example
use,
exec dbo.myProcedure
instead of,
exec myProcedure
4.      Do not create any stored procedures using sp_ as a prefix
This special prefix is reserved for system stored procedures. Using this prefix will slow down the execution because any stored procedure executed by SQL Server that begins with the prefix “sp_”, is first attempted to be resolved in the Master database. Hence, that much time is wasted in looking out for that stored procedure.
5.      Select only the required columns
Usage of SELECT * can be the cause of NOT using the indexes available on the table for querying data. Also, selecting more data means more IO.
6.      Minimize the use of not equal operations, <> or !=.
SQL Server has to scan a table or index to find all values to see if they are not equal to the value given in the expression.
Incorrect Usage:
SELECT Name, Description, OrderDate FROM Orders WHERE OrderDate <> ‘2005-01-01’
Correct Usage:
SELECT Name, Description, OrderDate FROM Orders WHERE OrderDate < ‘2005-01-01’ OR OrderDate > ‘2005-01-01’
7.      Avoid Mixing-up DML and DDL statement on a temp table inside sp
When you CREATE a temp table (#table) and further on ALTER the same temp table in the same stored procedure, it causes the stored procedure to get recompiled thus adversely affecting the performance.
8.      Avoid using DISTINCT when not required.
SQL Server performs a sort operation for distinct calculation, which is costly.
9.      Ensure columns with similar data types are used when using JOINS
For best performance, the columns used in joins should be of the same data types. And if possible, they should be numeric data types rather than character data types.
10.  Use EXISTS clause instead of Count(*) for checking existence
Use EXISTS instead of COUNT(*) when looking for the existence of one or more rows in a sub query. EXISTS cancels the sub query once the first existence of a record is found, while COUNT(*) forces processing of the entire query.
11.  Don’t do SELECT max(ID) from a master table when inserting into a details table.
This will fail when concurrent users are inserting records to the tables at the same time.
Use one of SCOPE_IDENTITY or IDENT_CURRENT.
SCOPE_IDENTITY would give you the identity value from the current context in perspective.
SELECT @@IDENTITY returns the last IDENTITY value produced on a connection, regardless of the table that produced the value, and regardless of the scope of the statement that produced the value.
@@IDENTITY returns the last identity value entered into a table in your current session. While @@IDENTITY is limited to the current session, it is not limited to the current scope. If you have a trigger on a table that causes an identity to be created in another table, you will get the identity that was created last, even if it was the trigger that created it.
SELECT SCOPE_IDENTITY() returns the last IDENTITY value produced on a connection and by a statement in the same scope, regardless of the table that produced the value.
SCOPE_IDENTITY(), like @@IDENTITY returns the last identity value created in the current session, but it will also limit it to your current scope as well. In other words, it will return the last identity value that you explicitly created, rather than any identity that was created by a trigger or a user defined function.
SELECT IDENT_CURRENT(‘tablename’)
It returns the last IDENTITY value produced in a table, regardless of the connection that created the value, and regardless of the scope of the statement that produced the value.
IDENT_CURRENT is not limited by scope and session; it is limited to a specified table. IDENT_CURRENT returns the identity value generated for a specific table in any session and any scope.
To avoid the potential problems associated with adding a trigger later on, always use SCOPE_IDENTITY() to return the identity of the recently added row in your T SQL Statement or Stored Procedure.
12.  Remove PRINT statements from stored procedures.
Normally PRINTstatements are used for debugging the SPs. Print statements adds an additional overhead to performance because SQL server sends the  output to client.
For the purpose of debugging use print with a conditional check for global of a debug flag.
For Example
IF (@PrintFlag =1)

BEGIN                         Debug.Print(‘ Step 1 Completed ’)             END

13.  Do not use Dynamic SQL inside stored procedures.
Every query you run in SQL Server requires a query plan. When you run a query the first time, SQL Server builds a query plan for it. SQL Server saves the plan in cache, and next time you run the query, the plan is reused. However, for dynamic SQL, query execution plan is not cached for the dynamic SQL.
14.  Keep all Transactions short as possible.
This helps to reduce the number of locks (of all types), helping to speed up the overall performance of SQL Server.
15.  Use SP_EXECUTESQL rather than EXEC() for better performance and improved security
Sp_executesql offers two major advantages over EXECUTE,
1. First, it supports parameter substitution. Parameterised statements help in avoiding the risk of SQL injection.
2. Second, it creates query execution plans that are more likely to be reused by SQL Server, which in turn reduces overhead on the server, boosting performance.
For Example:
DECLARE @SQLString nvarchar(500);
DECLARE @ParmDefinition nvarchar(500);
DECLARE @FieldName nvarchar(25);
SET @FieldName = N’Integer’
/* Build the SQL string one time.*/
SET @SQLString = N’select * from Emp where name = @ParamValue’
SET @ParmDefinition = N’@ParamValue varchar(25)’
EXECUTE sp_executesql @SQLString, @ParmDefinition, @ParamValue = @FieldName;
Ref: http://msdn.microsoft.com/en-us/library/ms188001.aspx
16.  Always put the DECLARE statements at the starting of the stored procedure.
This enables query optimizer to reuse query plans and also increases the readability.
17.  Use UNION and UNION ALL properly based on requirement.
A UNION statement effectively does a SELECT DISTINCT on the result set. If you know that all the records returned are unique from your union, use UNION ALL instead, it gives faster results.
18.  Do not use Cursors when they are not required.
Cursors of any kind slow down the SP performance. While is some cases they cannot be avoided, in many cases they can.
If you need to perform row-by-row operations, consider using one or more of these options instead of using a cursor:
1. Use temp tables
2. Use WHILE loops
3. Use derived tables
4. Use correlated sub-queries
5. Use CASE statements
6. Use multiple queries
19.  Do not use NON ANSI joins
The use of the JOIN clause separates the relationship logic from the filter logic (the WHERE) and is thus cleaner and easier to understand.
ANSI
select * from a join b on a.id = b.id
Non ANSI
select * from a, b where a.id = b.id
20.  Transaction usage should have Error Checking basis for rollback
Use transactions only where necessary. If you are doing single insert/update or temp tables manipulations then transactions are not required. Transactions are required when you are doing multiple inserts/updates and either all statements need to be successful or none.
Transaction handling should involve proper rollbacks in case of errors.
Use @@error, @@rowcount variables instead of running through tables to find out last rows or status of a transaction.
21.  Indexes on Temp Tables
Add proper indexes to your temp tables in the SPs to allow faster retrieval of data from these temp tables.
22.  Use @@rowcount, @@error  variables to find out last rows or status of a execution
For Example,
Do not use ‘select @noofemp = count(*) from #emp’ after inserting the data into this table.
Instead:
Use ‘select @noofemp = @@rowcount’ immediately after inserting rows.
23.  Recommend Usage of Project Specific Reusable Functions
Have project specific functions for common tasks that are repeated across SPs.
24.  If a stored procedure needs to return only a single value, and not a record set, consider returning the single value using RETURN statement instead of returning it as part of record set
While return statements are generally used for error-checking, returning a single value as return value is faster than returning a single value as part of a record set.
25. Create proper indexes and avoid misuse of indexes – Index the table/ temp table for the most commonly used columns in the WHERE clause of SELECT/UPDATE queries
The performance is affected if you do not create indexes that support the queries that are issued against the table. However, if you have too many indexes, the insert and update performance of the table suffers. Find a balance between the indexing needs of the writes and reads for the tables/ temp tables used in the SP
26. Retrieve just as many columns and rows as are required for the given operation. Avoid subqueries that return more columns/ rows than is required
There is no point in retriving more data than what is currently required. Retrieving too much data leads to increased network traffic, and increased server and client resource use. This can include both columns and rows.
27. Check the performance of important queries in SP using SQL Profiler. Optimise queries that take large time
Although it is not realistic to monitor every query, try to  measure the performance most commonly used queries and see how they can be optimised
28. Ensure correct usage of Joins
Ensure that that all parts of the keys are included in the ON clause.  Pay extra attention to ensure that no cross products result from missing ON or WHERE clauses for joined tables.
29. Avoid costly operators such as LIKE, NOT LIKE
The LIKE operator with a value enclosed in wildcards (“%a value%”) causes a table scan mosta of the times.
Negative operations, such as <> or NOT LIKE, are also very difficult to resolve efficiently. Try to rewrite them in another way if you can. If you are only checking for existence, use the IF EXISTS or the IF NOT EXISTS construct instead. You can use an index. If you use a scan, you can stop the scan at the first occurrence.
30. Avoid functions in WHERE clause if possible, especially for datetime columns

The optimizer cannot always select an index by using columns in a WHERE clause that are inside functions. Columns in a WHERE clause are seen as an expression rather than a column. Therefore, the columns are not used in the execution plan optimization. A common problem is date functions around datetime columns.
The following query with a function on the datetime column causes a table scan even if there is an index on the OrderDate column:

SELECT OrderID FROM Orders WHERE DATEADD(day, 15,
OrderDate) = ’07/23/1996′
However, by moving the function to the other side of the WHERE equation, an index can be used on the datetime column as shown below

SELECT OrderID FROM Orders WHERE OrderDate = DATEADD(day,
-15, ’07/23/1996′)

31. Avoid implicit conversions

Implicit conversions also cause table and index scans, often because of data type mismatches. Be especially wary of nvarchar and varchar data type mismatches and nchar and char data type mismatches that cause an implicit conversion. You can see these in the following execution plan. The following example uses a local variable of type char against an nchar column in the Customers table. The type mismatch causes an implicit conversion and a scan in this example:

DECLARE @CustID CHAR(5)
SET @CustID = ‘FOLKO’
SELECT CompanyName FROM Customers WHERE CustomerID = @CustID

32. Use Locking and Isolation Level Hints in queries to Minimize Locking if required

Instead of accepting the default SQL Server isolation level of read committed, you can explicitly select the appropriate isolation level.

33. Use table variables cautiously

 Table variables are not considered by the optimizer when the optimizer generates execution plans and parallel queries. Therefore, table variables may cause decreased performance. Also, table variables cannot be indexed as flexibly as temporary tables.

34. Use temp tables cautiously – Drop the temp table after usage

If your application frequently creates temporary tables, consider using the table variable. Table variables are cleaned up automatically at the end of the function, stored procedure, or batch that they are defined in. Many requests to create temporary tables may cause jam in both the tempdb database and in the system tables. Very large temporary tables are also problematic. If you find that you are creating many large temporary tables, you may want to consider a permanent table that can be truncated between uses.

You have to test temporary table and table variable usage for performance. Test with many users for scalability to determine the approach that is best for each situation. Also,  there may be concurrency issues when there are many temporary tables and variables that are requesting resources in the tempdb database.

35. Top is not an array function and its use to perform array like activities is not acceptable

Do not use Top to loop through all the rows of a temp table. Instead add an identity column to the temp table.

36. Use COALESCE to replace null with blank string / default value

37. If you need to fetch the current date/ datetime in many places within the SP, fetch it once in the beginning and store it in a variable. Use the variable consequently instead of calling the GetDate() function repeatedly

June 24, 2012 at 5:36 am Leave a comment

Reducing Windows Azure Costs

With Windows Azure’s pay as you go model, it becomes increasingly important to monitor our Azure spends and cut costs wherever possible to get the optimum benefits. Listed below are some steps that would help reduce unnecessary Azure costs.
1.      Go for the VM size that is best suited for your application
Most people start with a small instance and then increase the VM size as required. On the contrary, it works better to profile your application and decide on the VM size that best suits your application deployment. In certain scenarios scaling out with multiple smaller VM sizes works out cheaper than scaling up the existing VM size.
2.      Identify and Delete suspended deployments
Many a times in an organization, people create Azure deployments for demo or testing purposes and then totally forget to delete them. The billing is on per hour basis for every VM allocated regardless of whether the deployed application is used or not. It would make a lot of sense to keep a watch for suspended and obsolete deployments and get rid of them to save costs.
3.      Use Affinity Groups
If your Roles use other Azure services like Azure Storage, SQL Azure etc., it is best to assign Affinity groups to these services and have them co-located. Basically Affinity Group tells the Fabric Controller that its Azure services should always be together and close to one another. The Fabric Controller then tries to deploy the services in the same cluster thus reducing latency and lowering the costs.
4.      Monitor and modify SQL Azure size as required
It is possible to increase/ reduce the SQL Azure database size by using the simple ALTER DATABASE command. With this facility it becomes very convenient to begin with a small sized database (1GB) and increase the database size if required.
5.      Use scheduled /dynamic Auto scaling based on the load
You can use load test tools (e.g. WCAT stress tool) to determine what load your role instance can handle. Based on this idea, and your estimation of load, you can auto scale your applications using frameworks like WASABi that allow scheduled and dynamic auto scaling of applications.
6.      Use Table Storage wherever non-relational database would do
The Table service provides much cheaper (and more scalable) storage for those scenarios where data can occupy a non-relational database.
7.      Review you bills
Last but not the least, it makes a lot of sense to review your Azure bills and determine the line items that could be optimized.

June 24, 2012 at 5:32 am Leave a comment

Hello world!

Hello World .. here i am .. this is my first post on wordpress. I started this blog to keep myself inspired, to learn and share my learnings, to say what I know and know what you say of that.

Looking forwards to a lot of followers … 🙂

Thanks,

Vaishali

 

June 24, 2012 at 4:56 am 1 comment


Archives