Wednesday, February 20, 2013

Types of User-Defined Functions And Performance Considerations

Microsoft SQL Server supports user-defined functions (UDFs) which provide for code encapsulation and reuse, but come with design and very significant performance considerations. Application developers new to T-SQL like UDFs because they align to a procedural programming design pattern familiar to that skill set. However unless understood and used properly, UDFs will greatly reduce database performance.

 In this post I will discuss the following:

1.    Types of UDFs and examples
2.    Performance comparisons and explanations

1. Types of UDFs and examples

UDFs in SQL Server can return Tables or Scalar values. Additionally UDFs can be Multi-Statement or Single-Statement (labeled In-Line). So logically there are four types – although Scalar In-Line UDFs currently are not supported.

                                    Multi-Statement        In-Line
Scalar-Valued            Supported                   Not Supported
Table-Valued             Supported                   Supported

In the SSMS GUI you can see the UDFs in the Programmability -> Functions section.









For our examples, we will be using a simple City, State database. We will be reviewing In-Line vs. Multi-Statement versions of a function to return the first city (alphabetically) for a given state and letter.

Although Scalar functions typically are not explicitly qualified as Multi-Statement, I have named the function with “Multi” in this post to be clear that in this regard Scalar functions align to Multi-Statement Table-Valued functions. As will be explained below, the difference between Multi-Statement and Single-Statement gets to the essential difference among the various types of UDFs and corresponding performance considerations. Below are code samples for the three UDFs.


CREATE FUNCTION [dbo].[GetFirstCity_Scalar_Multi](
        @CDState            VARCHAR(2)
       ,@CityFirstLetter    VARCHAR(1)
)

RETURNS VARCHAR(100)
AS

BEGIN
       DECLARE @TXCity VARCHAR(100)

            --other T-SQL statements allowed

       SELECT TOP 1 
              @TXCity = TXCity
       FROM  
              dbo.GeoData
       WHERE
              CDState = @CDState
       AND    TXCity LIKE @CityFirstLetter + '%'
       ORDER BY
              TXCity

            --other T-SQL statements allowed

       RETURN @TXCity
END

CREATE FUNCTION [dbo].[GetFirstCity_Table_Multi](
        @CDState            VARCHAR(2)
       ,@CityFirstLetter    VARCHAR(1)
)

RETURNS @City TABLE
    (
        TXCity       VARCHAR(100)
    )
AS

BEGIN
   --other T-SQL statements allowed

    INSERT
       INTO
              @City
                     SELECT TOP 1 
                            TXCity
                     FROM  
                           dbo.GeoData
                     WHERE
                           CDState = @CDState
                     AND    TXCity LIKE @CityFirstLetter + '%'
                     ORDER BY
                            TXCity

   --other T-SQL statements allowed
    RETURN
END

CREATE FUNCTION  [dbo].[GetFirstCity_Table_InLine](
        @CDState            VARCHAR(2)
       ,@CityFirstLetter    VARCHAR(1)
) RETURNS TABLE

AS
RETURN
      --only one SELECT allowed
       SELECT TOP 1 
               TXCity
       FROM  
              dbo.GeoData
       WHERE
              CDState = @CDState
       AND    TXCity LIKE @CityFirstLetter + '%'
       ORDER BY
               TXCity

Note that the essential work of each UDF – the data access SQL – is identical. What distinguishes each is “only” the interface and implementation wrapping it. But we will see next that this has significant performance implications.

2. Performance comparisons and explanations

Something I always do when comparing various syntax and techniques in T-SQL is to run all options in a single request and compare the Execution Plans and relative costs. Often times this is a good approach but not always. Taking this at face value, the Scalar UDF would be by far the best and the two Table-Valued versions equal.

However taking a closer look, the first thing to notice is that only the Table_InLine version shows any actual data access. The Scalar and Table_Multi show only scans of data - somehow already available. This gets to the essence of the matter. The Scalar and Table_Multi UDFs are black boxes to the caller and query optimizer; the data access in these two UDFs is not transparent to the caller. These behave much like a VB or C# function where the caller simply passes parameters and gets a result – knowing nothing of the implementation of how the result was produced. Likewise, the optimizer does not have visibility into these functions so the cost of these functions is not reported in the Execution Plans.

Looking to the io stats, we see something similar. Only the Table_InLine version shows the real data access.

Table_Multi
Table '#ABDEF001'. Scan count 1, logical reads 1,

Table_InLine
Table 'GEOData'. Scan count 1, logical reads 2

Scalar
(1 row(s) affected)


Additionally running each UDF a single time shows no real difference in wall clock execution time. So I created a performance test which ran each UDF for a distinct list of states, for each for the 26 letters of the alphabet - and iterated this 200 times. Note that I turned on the Discard Results option to better compare just processing time. The results are below.

Table_Multi     0:25
Scalar              0:07
Table_InLine   0:04

This demonstrates and quantifies that the black box nature of the Scalar and Table-Valued Multi UDFs – causing the data access not to show in execution plans – also causes performance issues. To state the obvious, if the optimizer cannot see the SQL, it cannot be optimized for the overall request.

Additionally, use of Multi-Statement UDFs (Table-Valued or Scalar) compounds other inefficiencies. For example, if we remove the IX_StateAndCity index (which will result in table scans), and repeat the test above, the results are as follows.

Table_Multi     31:49 (7,546% Increase)
Scalar              31:09 (26,600% Increase)
Table_InLine   1:04 (1,500% Increase)

In terms of percent increase, the In-Line UDF incurred the least impact and in terms of absolute execution time, it runs in a 30th of the time of the other UDFs.  Multi-Statement UDFs magnify other inefficiencies.

What this all underscores is that coding T-SQL properly is a different skill set from procedural application design patterns. Except in rare and atypical scenarios, set-based T-SQL will outperform an iterative solution. Multi-Statement UDFs cannot be substituted into the query calling them and so force the optimizer to use iteration – calling the UDF for each row. In T-SQL world, this is referred to as RBAR (Row By Agonizing Row)  which was coined by Jeff Moden.

By contract InLine UDFs use a form of substitution whereby the UDF code is transparent to the caller and considered as part of the Execution Plan - hence the term In-Line. Below are two queries which (1) call the InLine UDF and (2) explicitly dups the UDF code in the query. The highlighted sections show the different implementations of the CROSS APPLY.

Query #1
SELECT
        CDState
       ,FirstCityForLetterApply.TXCity AS FirstCityForLetter
FROM
       StatesCTE
       CROSS JOIN Letters
       CROSS APPLY [dbo].[GetFirstCity_Table_InLine](CDState,Letter) FirstCityForLetterApply

Query #2
SELECT
        CDState
       ,FirstCityForLetterApply.TXCity AS FirstCityForLetter
FROM
       StatesCTE
       CROSS JOIN Letters
       CROSS APPLY
              (
              SELECT TOP 1 
                      TXCity
              FROM  
                     dbo.GeoData GeoDataApply
              WHERE
                     GeoDataApply.CDState = StatesCTE.CDState
              AND    GeoDataApply.TXCity LIKE Letters.Letter  + '%'
              ORDER BY
                      TXCity
              ) FirstCityForLetterApply

And we can see that the Execution Plans are identical. So with an In-Line UDF we  are able to benefit from code encapsulation and reuse while retaining the performance of a fully optimized query with all parts visible to the optimizer.









Finally, Multi-Statement UDFs impact the ability to parallelize query plans. For more information on this, I refer you to this Dec 2011 post of Paul White: http://sqlblog.com/blogs/paul_white/archive/2011/12/23/forcing-a-parallel-query-execution-plan.aspx

Later

-Regan

Thursday, January 10, 2013

Identifying and Eliminating Key Lookups in Query Execution Plans



There are four sections in this post
1.    Definition of Key Lookup
2.    How to eliminate a Key Lookup
3.    Further aspects
4.    DMV Query to identify and prioritize Key Lookups

1.    Definition of Key Lookup


A Key Lookup occurs when the optimizer elects to use a non-clustered index but must reference back to the clustered index to read data for columns not in the non-clustered index.


Consider a Customer table as follows which I have populated with 1M records.


CREATE TABLE [dbo].[Customer](
       [IDCustomer] [int] IDENTITY(1,1) NOT NULL,
       [TXFirstName] [varchar](50) NOT NULL,
       [TXLastName] [varchar](50) NOT NULL,
       [TXAddress] [varchar](100) NOT NULL,
       [TXPhone] [varchar](50) NOT NULL,
       [DTCreated] [datetime] NOT NULL,
CONSTRAINT [PK_Customer] PRIMARY KEY CLUSTERED
(
       [IDCustomer] ASC
)
) ON [PRIMARY]
GO


We want to query this table for all customers’ First Name and Last Name who have a DTCreated in the first week of January 2010.


CREATE PROCEDURE dbo.GetData
AS
       SELECT
              TXFirstName
              ,TXLastName
       FROM
              dbo.Customer
       WHERE
              DTCreated >= '2010-01-01'
       AND    DTCreated < '2010-01-08';


We can see that this performs a Clustered Index Scan. Without an index on DTCreated, it has no choice but to scan the entire base table picking off records that match the DTCreated criteria.




Next, we create an index on DTCreated. This will improve the query by allowing for a seek operation on the DTCreated values and thereby eliminating the full table scan of all 1M records.


 CREATE NONCLUSTERED INDEX [IX_DTCreated] ON [dbo].[Customer]
(
       [DTCreated] ASC
) ON [PRIMARY]
GO


But this is where we see the introduction of the Key Lookup operation. Because the index does not specify TXFirstName and TXLastName (which are in the SELECT clause) the plan must reference the base table to lookup those values.


 Key Lookup Properties


The Output List specifies the columns not in the index which are causing the Key Lookup to occur. This is essential information for eliminating the Key Lookup.

2.    How to eliminate a Key Lookup


While in most cases better than a full table scan, a Key Lookup still involves additional reads, and so if possible Key Lookups should be eliminated. We can eliminate the Key Lookup which references the base table by INCLUDE-ing TXFirstName and TXLastName (detailed in the Key Lookup properties) in the index thereby covering the query. Essentially, the index is functioning as a separate table which contains all the columns required for the query and is sorted appropriately for the search criteria.


CREATE NONCLUSTERED INDEX [IX_DTCreatedCovering] ON [dbo].[Customer]
(
       [DTCreated] ASC
)
INCLUDE ([TXFirstName],[TXLastName])  ON [PRIMARY]
GO


Index Seek Properties














3.    Further aspects

This section covers 2 additional aspects of covering indexes and Key Lookup elimination: (a) Index Key versus INCLUDEd columns and (b) index benefit versus cost.


Prior to SQL Server 2005, the only way to have an index cover a query was to append the needed columns to the index key. Using our example, such an index would look like this.


 CREATE NONCLUSTERED INDEX [IX_DTCreatedTXFirstNameTXLastName] ON [dbo].[Customer]
(
       [DTCreated] ASC,
       [TXFirstName] ASC,
       [TXLastName] ASC
) ON [PRIMARY]
GO


The problem with this approach is a wide index key reduces the number of records that can fit on intermediate pages within the index b-tree, and so the depth of the tree likely will increase requiring additional reads to get to the same records.


Beginning with SQL Server 2005, indexes support an INCLUDE clause which specifies column values that should be written only to leaf-level pages. This is having it both ways: keeping the index key narrow while being able to cover queries with columns not relevant to the index key itself. The IX_DTCreated index has a depth of 3 whereas the wide index IX_ DTCreatedTXFirstNameTXLastName has a depth of 4. However, the IX_DTCreatedCovering has a depth of 3 (same as the IX_DTCreated) because the additional columns are INCLUDEd only on the leaf-level pages of the index.


The lesson here is to put search-interesting columns (aligned to the WHERE clause) in the index key and other columns referenced (SELECT and ORDER clauses) in the included section. Depending on the situation it may make sense to widen the index key with ORDER clause columns.


The second further aspect of indexes to consider is cost versus benefit. An index provides benefit for reads but costs on inserts, updates, and deletes – because these operations must update all affected indexes. So whereas IX_DTCreated is not updated when a value of TXFirstName has been changed, IX_DTCreatedCovering is updated. It is always wise to review Key Lookups for possible elimination, but it will not make sense to implement an index if the cost outweighs the benefit.

4.    DMV Query to identify and prioritize Key Lookups


Even the best maintained system will end up having unnecessary Key Lookups as sprocs are updated to return additional columns not accounted for when indexes were originally created. And as we all know, many databases exhibit no evidence that tuning items such as covering indexes have been even considered.


Fortunately, the dynamic management views (DMVs) offer visibility into Key Lookups. The DMV query below returns data for database objects with Key Lookup operators in their execution plan. One of the columns returned is the XML representation of the query plan, which when selected within an SSMS query window will display the graphical query plan. I do not claim that this is best DMV query for this data, but it is simple and works well.


WITH DataCTE
AS
(
SELECT                                                                                     
        CAST(ISNULL(db_name(QueryText.dbid),'') AS NVARCHAR(128)) AS [Database]
       ,CAST(ISNULL(object_name(QueryText.objectid, QueryText.dbid),'') AS NVARCHAR(128))AS Object
       ,sys.dm_exec_cached_plans.plan_handle                                                                             
       ,SUM(QueryStats.execution_count) AS ExecutionCount
       ,MAX(QueryStats.plan_generation_num) AS Recompilation_Total
       ,SUM(QueryStats.total_elapsed_time)      AS WallClock_Total
       ,SUM(QueryStats.total_worker_time) AS CPU_Total
       ,SUM(QueryStats.total_logical_reads) AS LogicalReads_Total
       ,SUM(QueryStats.total_logical_writes) AS LogicalWrites_Total
       ,SUM(QueryStats.total_physical_reads) AS PhysicalReads_Total
FROM
       sys.dm_exec_cached_plans
       INNER JOIN sys.dm_exec_query_stats  QueryStats
              ON QueryStats.plan_handle = sys.dm_exec_cached_plans.plan_handle
       CROSS APPLY sys.dm_exec_sql_text (sql_handle) QueryText
WHERE
       ISNULL(db_name(QueryText.dbid),'')  NOT IN ('','master','msdb')
GROUP BY
        CAST(ISNULL(db_name(QueryText.dbid),'') AS NVARCHAR(128))                        
       ,CAST(ISNULL(object_name(QueryText.objectid, QueryText.dbid),'') AS NVARCHAR(128))
      ,sys.dm_exec_cached_plans.plan_handle                                                                       
)
SELECT
        [Database] 
       ,Object
       ,query_plan AS QueryPlan
       ,plan_handle
       ,ExecutionCount
       ,Recompilation_Total
       ,WallClock_Total
       ,CPU_Total
       ,LogicalReads_Total
       ,LogicalWrites_Total
       ,PhysicalReads_Total
FROM
       DataCTE
       CROSS APPLY sys.dm_exec_query_plan(plan_handle)
WHERE
       CAST(query_plan AS NVARCHAR(MAX)) LIKE '%Lookup=%'
ORDER BY
       (WallClock_Total / ExecutionCount) DESC --Avg Wall Clock
       --(CPU_Total / ExecutionCount) DESC --Avg CPU
       --(LogicalReads_Total / ExecutionCount) DESC --Avg Logical Reads
       --(LogicalWrites_Total / ExecutionCount) DESC --Avg Logical Writes
       --(PhysicalReads_Total / ExecutionCount) DESC --Avg Physical Reads

Having removed from our example all non-clustered indexes except IX_DTCreated which we know uses the Key Lookup, we can see that this DMV query returns a record for the GetData sproc.





Upon selecting the QueryPlan XML link, we see the graphical representation with a Key Lookup operator (same as from the end of section 1) and can investigate and update the index to eliminate the Key Lookup - again if it makes sense.

It should be noted that this DMV covers only sprocs and UDFs. In-line SQL queries that use Key Lookups would not be reported here.

Hope this is of some help.


 Later


 -Regan