Manish Banga: SQL best practices for query optimization and increasing performance

1. Avoid unnecessary columns in the SELECT list and unnecessary tables in join conditions

Selecting unnecessary columns in a select query adds overhead to the actual query, specially if the unnecessary columns are of LOB types.
Including unnecessary tables in join conditions forces the database engine to retrieve and fetch unnecessary data and increases the query execution time.

2. Don't use "SELECT *" in a SQL query
Unnecessary columns may get fetched and may add expense to the data retrieval time.

3. Do not use the COUNT() aggregate in a subquery to do an existence check
Do not use:

SELECT column_list FROM table WHERE 0 < (SELECT count(*) FROM table2 WHERE ..)

Instead, use:

SELECT column_list FROM table WHERE EXISTS (SELECT * FROM table2 WHERE ...)

When you use COUNT(), SQL Server does not know that you are doing an existence check. It counts all matching values, either by doing a table scan or by scanning the smallest non-clustered index.
When you use EXISTS, SQL Server knows you are doing an existence check. When it finds the first matching value, it returns TRUE and stops looking. The same applies to using COUNT() instead of IN or ANY.

4. Try to avoid joining between two different types of columns

When joining between two columns of different data types, one of the columns must be converted to the type of the other. The column whose type is lower is the one that is converted.
If you are joining tables with incompatible types, one of them can use an index, but the query optimizer cannot choose an index on the column that it converts. For example:

SELECT column_list FROM small_table, large_table WHERE

smalltable.float_column = large_table.int_column

In this case, SQL Server converts the integer column to float, because int is lower in the hierarchy than float. It cannot use an index on large_table.int_column, although it can use an index on smalltable.float_column.

5. Try to avoid dynamic SQL
Unless really required, try to avoid the use of dynamic SQL because:

Dynamic SQL is hard to debug and troubleshoot.
If the user provides the input to the dynamic SQL, then there is possibility of SQL injection attacks.

6. Try to avoid the use of temporary tables

Unless really required, try to avoid the use of temporary tables. Try to use table variables in place of temporary tables.
In 99% of cases, table variables reside in memory, hence it is a lot faster. Temporary tables reside in the TempDb database. So operating on temporary tables require inter database communication and hence will be slower.

6. Try to avoid the use of NOT IN and IN

When you are writing queries containing NOT IN, then this is going to offer poor performance as the optimizer need to use nested table scan to perform this activity. This can be avoided by using EXISTS or NOT EXISTS.
When there is a choice to use IN or EXIST, we should go with EXIST clause for better performance.

7. Try to use UNION to implement an "OR" operation

Try not to use "OR" in a query. Instead use "UNION" to combine the result set of two distinguished queries. This will improve query performance.

Better use UNION ALL if a distinguished result is not required. UNION ALL is faster than UNION as it does not have to sort the result set to find out the distinguished values.

8. Try to use WITH(NOLOCK) in SELECT statements

WITH(NOLOCK) hint is an explicit command directed at a specific table or view used to set the transaction isolation level against the table or tables within a view for a query. Once issued, locks will not be used against the data within the table.

The advantage to this is there is no chance a deadlock will occur against any other queries running against the table. The other indirect advantage is that less memory will be used in order to hold locks against that data.

9. Implement the following good practices in User Defined Functions

Do not call functions repeatedly within your Stored Procedures, triggers, functions, and batches. For example, you might need the length of a string variable in many places of your procedure, but don't call the LEN function whenever it's needed; instead, call the LEN function once, and store the result in a variable for later use.

9. Implement the following good practices in Stored Procedures

Do not use "SP_XXX" as a naming convention. It causes additional searches and added I/O (because the system Stored Procedure names start with "SP_"). Using "SP_XXX" as the naming convention also increases the possibility of conflicting with an existing system Stored Procedure.

Use "Set Nocount On" to eliminate extra network trip.

10. Implement the following good practices in Triggers

Try to avoid the use of triggers. Firing a trigger and executing the triggering event is an expensive process.

Never use triggers that can be implemented using constraints.

Do not use the same trigger for different triggering events (Insert, Update, Delete).

Do not use transactional code inside a trigger. The trigger always runs within the transactional scope of the code that fires the trigger.

11. Implement the following good practices in Views

Use views for re-using complex TSQL blocks and to enable it for indexed views.

Do not use views that retrieve data from a single table only (that will be an unnecessary overhead). Use views for writing queries that access columns from multiple tables.

12. Implement the following good practices in Transactions

Start a transaction as late as possible and commit/rollback the transaction as fast as possible to reduce the time period of resource locking.

13. Implement a lazy loading strategy for large objects

Store Large Object columns (like VARCHAR(MAX), Image, Text etc.) in a different table than the main table, and put a reference to the large object in the main table.

Retrieve all the main table data in a query, and if a large object is required to be loaded, retrieve the large object data from the large object table only when it is required.

14. Use VARCHAR(MAX), VARBINARY(MAX), and NVARCHAR(MAX)

In SQL Server 2000, a row cannot exceed 8000 bytes in size. This limitation is due to the 8 KB internal page size of SQL Server. So to store more data in a single column, you need to use TEXT, NTEXT, or IMAGE data types (BLOBs) which are stored in a collection of 8 KB data pages.

These are unlike the data pages that store other data in the same table. These pages are arranged in a B-tree structure. These data cannot be used as variables in a procedure or a function, and they cannot be used inside string functions such as REPLACE, CHARINDEX, or SUBSTRING. In most cases, you have to use READTEXT, WRITETEXT, and UPDATETEXT.

To solve this problem, VARCHAR(MAX), NVARCHAR(MAX) and VARBINARY(MAX) were introduced in SQL Server 2005. These data types can hold the same amount of data BLOBs can hold (2 GB), and they are stored in the same type of data pages used for other data types.

When data in a MAX data type exceeds 8 KB, an over-flow page is used (in the ROW_OVERFLOW allocation unit), and a pointer to the page is left in the original data page in the IN_ROW allocation unit.

Manish Banga

Monday, February 23, 2015

SQL best practices for query optimization and increasing performance

No comments:

Post a Comment