There are
four sections in this post
1. Definition of Key Lookup
2. How to eliminate a Key Lookup1. Definition of Key Lookup
3. Further aspects
4. DMV Query to identify and prioritize Key Lookups
1. Definition of Key Lookup
A Key
Lookup occurs when the optimizer elects to use a non-clustered index but must
reference back to the clustered index to read data for columns not in the
non-clustered index.
Consider a
Customer table as follows which I have populated with 1M records.
CREATE TABLE [dbo].[Customer](
[IDCustomer] [int] IDENTITY(1,1) NOT NULL,[TXFirstName] [varchar](50) NOT NULL,
[TXLastName] [varchar](50) NOT NULL,
[TXAddress] [varchar](100) NOT NULL,
[TXPhone] [varchar](50) NOT NULL,
[DTCreated] [datetime] NOT NULL,
CONSTRAINT [PK_Customer] PRIMARY KEY CLUSTERED
(
[IDCustomer] ASC
)
) ON [PRIMARY]
GO
We want to
query this table for all customers’ First Name and Last Name who have a DTCreated
in the first week of January 2010.
CREATE PROCEDURE dbo.GetData
ASSELECT
TXFirstName
,TXLastName
FROM
dbo.Customer
WHERE
DTCreated >= '2010-01-01'
AND DTCreated < '2010-01-08';
We can see
that this performs a Clustered Index Scan. Without an index on DTCreated, it
has no choice but to scan the entire base table picking off records that match the
DTCreated criteria.
Next, we
create an index on DTCreated. This will improve the query by allowing for a
seek operation on the DTCreated values and thereby eliminating the full table
scan of all 1M records.
[DTCreated] ASC
) ON [PRIMARY]
GO
But this
is where we see the introduction of the Key Lookup operation. Because the index
does not specify TXFirstName and TXLastName (which are in the SELECT clause)
the plan must reference the base table to lookup those values.
Key Lookup
Properties
The Output List specifies the columns not in the index which are causing the Key Lookup to occur. This is essential information for eliminating the Key Lookup.
2.
How
to eliminate a Key Lookup
While in most cases better than a full table scan, a Key Lookup still involves additional reads, and so if possible Key Lookups should be eliminated. We can eliminate
the Key Lookup which references the base table by INCLUDE-ing TXFirstName and
TXLastName (detailed in the Key Lookup properties) in the index thereby covering the query. Essentially, the index is functioning
as a separate table which contains all the columns required for the query and is sorted
appropriately for the search criteria.
CREATE NONCLUSTERED INDEX [IX_DTCreatedCovering]
ON [dbo].[Customer]
([DTCreated] ASC
)
INCLUDE ([TXFirstName],[TXLastName]) ON [PRIMARY]
GO
Index Seek
Properties
3. Further aspects
This section covers 2 additional aspects of covering indexes and Key Lookup elimination: (a) Index Key versus INCLUDEd columns and (b) index benefit versus cost.
Prior to
SQL Server 2005, the only way to have an index cover a query was to append the
needed columns to the index key. Using our example, such an index would look
like this.
[DTCreated] ASC,
[TXFirstName] ASC,
[TXLastName] ASC
) ON [PRIMARY]
GO
The
problem with this approach is a wide index key reduces the number of records
that can fit on intermediate pages within the index b-tree, and so the depth of
the tree likely will increase requiring additional reads to get to the same
records.
Beginning
with SQL Server 2005, indexes support an INCLUDE clause which specifies column
values that should be written only to leaf-level pages. This is having it both
ways: keeping the index key narrow while being able to cover queries with
columns not relevant to the index key itself. The IX_DTCreated index has a depth
of 3 whereas the wide index IX_ DTCreatedTXFirstNameTXLastName has a depth of
4. However, the IX_DTCreatedCovering has a depth of 3 (same as the IX_DTCreated)
because the additional columns are INCLUDEd only on the leaf-level pages of the
index.
The lesson
here is to put search-interesting columns (aligned to the WHERE clause) in the
index key and other columns referenced (SELECT and ORDER clauses) in the
included section. Depending on the situation it may make sense to widen the index
key with ORDER clause columns.
The second
further aspect of indexes to consider is cost versus benefit. An index provides benefit
for reads but costs on inserts, updates, and deletes – because these operations
must update all affected indexes. So whereas IX_DTCreated is not updated when a
value of TXFirstName has been changed, IX_DTCreatedCovering is updated. It is
always wise to review Key Lookups for possible elimination, but it will not make sense to implement an index if the
cost outweighs the benefit.
Even the
best maintained system will end up having unnecessary Key Lookups as sprocs are
updated to return additional columns not accounted for when indexes were
originally created. And as we all know, many databases exhibit no evidence that
tuning items such as covering indexes have been even considered.
Fortunately,
the dynamic management views (DMVs) offer visibility into Key Lookups. The DMV query
below returns data for database objects with Key Lookup operators in their
execution plan. One of the columns returned is the XML representation of the query
plan, which when selected within an SSMS query window will display the graphical
query plan. I do not claim that this is best DMV query for this data, but it is
simple and works well.
WITH DataCTE
AS
(
SELECT
CAST(ISNULL(db_name(QueryText.dbid),'') AS NVARCHAR(128)) AS [Database]
,CAST(ISNULL(object_name(QueryText.objectid, QueryText.dbid),'') AS NVARCHAR(128))AS Object
,sys.dm_exec_cached_plans.plan_handle
,SUM(QueryStats.execution_count) AS ExecutionCount
,MAX(QueryStats.plan_generation_num) AS Recompilation_Total
,SUM(QueryStats.total_elapsed_time) AS WallClock_Total
,SUM(QueryStats.total_worker_time) AS CPU_Total
,SUM(QueryStats.total_logical_reads) AS LogicalReads_Total
,SUM(QueryStats.total_logical_writes) AS LogicalWrites_Total
,SUM(QueryStats.total_physical_reads) AS PhysicalReads_Total
FROM
sys.dm_exec_cached_plans
INNER JOIN sys.dm_exec_query_stats
QueryStats
ON QueryStats.plan_handle = sys.dm_exec_cached_plans.plan_handle
CROSS APPLY sys.dm_exec_sql_text (sql_handle) QueryText
WHERE
ISNULL(db_name(QueryText.dbid),'') NOT IN ('','master','msdb')
GROUP BY
CAST(ISNULL(db_name(QueryText.dbid),'') AS NVARCHAR(128))
,CAST(ISNULL(object_name(QueryText.objectid, QueryText.dbid),'') AS NVARCHAR(128))
,sys.dm_exec_cached_plans.plan_handle
)
SELECT
[Database]
,Object
,query_plan AS QueryPlan
,plan_handle
,ExecutionCount
,Recompilation_Total
,WallClock_Total
,CPU_Total
,LogicalReads_Total
,LogicalWrites_Total
,PhysicalReads_Total
FROM
DataCTE
CROSS APPLY sys.dm_exec_query_plan(plan_handle)
WHERE
CAST(query_plan AS NVARCHAR(MAX)) LIKE '%Lookup=%'
ORDER BY
(WallClock_Total /
ExecutionCount)
DESC --Avg Wall Clock
--(CPU_Total / ExecutionCount) DESC --Avg CPU
--(LogicalReads_Total / ExecutionCount) DESC --Avg Logical
Reads
--(LogicalWrites_Total / ExecutionCount) DESC --Avg Logical
Writes
--(PhysicalReads_Total / ExecutionCount) DESC --Avg
Physical Reads
Having removed from our example all non-clustered indexes except IX_DTCreated which we know uses the Key Lookup, we can see that this DMV query returns a record for the GetData sproc.
It should be noted that this DMV covers only sprocs and UDFs. In-line SQL queries that use Key Lookups would not be reported here.
Hope this is of some help.