There is a common problem of converting a delimited list
of values to a table format to join to other tables within a database. For
example, a session database might have a stored procedure that accepts two
parameters: (a) a SessionID and (b) a comma-delimited list of attributes to
retrieve. Below are 6 techniques to accomplish this within SQL Server and a 7th
which essentially pushes the processing to the application tier.
We will work through concrete stored procedure examples
using the following table. The stored procedures each accept the same
parameters – a single SessionID and a deliminted list of attributes requested
for the given SessionID.
CREATE TABLE [dbo].[Sessions](
[SessionID]
[bigint] NOT NULL,
[AttributeID]
[bigint] NOT NULL,
[Value] [varchar](50) NOT NULL,
CONSTRAINT [PK_Sessions] PRIMARY
KEY CLUSTERED
(
[SessionID]
ASC,
[AttributeID]
ASC
) ON [PRIMARY];
GO
New-comers to T-SQL often try the SQL below - not
realizing that the parameter is an atomic value and does not materialize into
the SQL statement.
--solution 0 - intuitive, but not supported
CREATE PROCEDURE uspGetAttributes
@SessionID BIGINT
,@AttributeIDs VARCHAR(1000)
AS
SELECT
SessionID
,AttributeID
,Value
FROM
dbo.Sessions
WHERE
SessionID = @SessionID
AND AttributeID IN (@AttributeIDs);
GO
Solution 1 below works, but is using an iterative parsing
approach – adopted from application development but foreign to set-based T-SQL.
It is unnecessarily complex and is not recommended.
--solution 1 - iterative split to table variable
CREATE PROCEDURE uspGetAttributes
@SessionID BIGINT
,@AttributeIDs VARCHAR(1000)
AS
DECLARE @SplitValues TABLE (SplitVal VARCHAR(100))
DECLARE @pos INT;
DECLARE @tmpval VARCHAR(20);
SET @pos = CHARINDEX(',', @AttributeIDs);
WHILE @pos > 0
BEGIN
SET @tmpval = LEFT(@AttributeIDs, CHARINDEX(',', @AttributeIDs) - 1)
INSERT @SplitValues (SplitVal) VALUES (@tmpval)
SET @AttributeIDs = SUBSTRING(@AttributeIDs, @pos + 1, DATALENGTH(@AttributeIDs))
SET @pos = CHARINDEX(',', @AttributeIDs)
END
INSERT @SplitValues (SplitVal) VALUES (@AttributeIDs);
SELECT
SessionID
,AttributeID
,Value
FROM
dbo.Sessions
INNER JOIN @SplitValues AttributeSplit
ON AttributeSplit.SplitVal = dbo.Sessions.AttributeID
WHERE
SessionID
= @SessionID;
GO
The second solution uses a recursive Common Table
Expression (CTE). Although by some definitions technically set-based, this
essentially still is an iterative approach to parse out the values into a
derived table. This is interesting to understand but there are better
solutions.
--solution 2 - Recursive CTE
CREATE PROCEDURE uspGetAttributes
@SessionID BIGINT
,@AttributeIDs VARCHAR(1000)
AS
SET @AttributeIDs += ',';
WITH SplitValsCTE
AS
(
SELECT
SUBSTRING(@AttributeIDs,1,CHARINDEX(',', @AttributeIDs)-1) AS SplitVal
,SUBSTRING(@AttributeIDs,CHARINDEX(',', @AttributeIDs)+1, DATALENGTH(@AttributeIDs)) AS RemainingString
UNION ALL
SELECT
SUBSTRING(RemainingString,1,CHARINDEX(',', RemainingString)-1)
,SUBSTRING(RemainingString,CHARINDEX(',', RemainingString )+1, DATALENGTH(RemainingString))
FROM
SplitValsCTE
WHERE
CHARINDEX(',', RemainingString) > 0
AND DATALENGTH(SplitVal) > 0
)
SELECT
SessionID
,AttributeID
,Value
FROM
dbo.Sessions
INNER JOIN SplitValsCTE
ON SplitValsCTE.SplitVal = dbo.Sessions.AttributeID
WHERE
SessionID = @SessionID;
GO
The third solution uses a numbers or tally table. This uses
a table of integers along with string functions to parse out the values. If the
length of delimited list string exceeds 2,047, you would need to
implement a different numbers table – either a static base table or a virtual
table. See Itzik Ben-Gan for this: http://www.sqlmag.com/article/sql-server/virtual-auxiliary-table-of-numbers
--solution 3 - numbers table
CREATE PROCEDURE uspGetAttributes
@SessionID BIGINT
,@AttributeIDs VARCHAR(1000)
AS
SET @AttributeIDs = ',' + @AttributeIDs + ',';
SELECT
SessionID
,AttributeID
,Value
FROM
[master].[dbo].[spt_values] Nums
INNER JOIN dbo.Sessions
ON SUBSTRING(@AttributeIDs,(Nums.number+1),CHARINDEX(',',@AttributeIDs,(Nums.number+1))-Nums.number-1) = dbo.Sessions.AttributeID
WHERE
SessionID = @SessionID
AND type = 'P'
AND Nums.number BETWEEN 1 AND DATALENGTH(@AttributeIDs)
AND SUBSTRING(@AttributeIDs,Nums.number,1) = ','
AND CHARINDEX(',',@AttributeIDs,(Nums.number+1)) > Nums.number;
GO
The fourth solution converts the input string to XML
which then allows for using XML functions to parse out the values.
--solution 4 - xml
CREATE PROCEDURE uspGetAttributes
@SessionID BIGINT
,@AttributeIDs VARCHAR(1000)
AS
DECLARE @AttributeIDsXML AS XML
SET @AttributeIDs = LEFT(@AttributeIDs, DATALENGTH(@AttributeIDs)-1) ;
SET @AttributeIDsXML = CAST('<root><x>' +
REPLACE(@AttributeIDs,',','</x><x>')
+ '</x></root>'
AS XML) ;
SELECT
dbo.Sessions.SessionID
,dbo.Sessions.AttributeID
,dbo.Sessions.Value
FROM
dbo.Sessions
INNER JOIN
(
SELECT
root.x.value('.', 'BIGINT') AS AttributeID
FROM
@AttributeIDsXML.nodes('(/root/x)') as root(x)
) AttributeIDs
ON AttributeIDs.AttributeID = dbo.Sessions.AttributeID
WHERE
SessionID
= @SessionID;
GO
The fifth option is dynamic SQL. This is a heavy-handed
way of forcing what originally was tried in the non-workable solution 0. It
dynamically builds the SQL statement into a string variable - materializing the
list into the statement string; and then EXECUTEs that statement. It works but is not
recommended for performance reasons as well as possible SQL injection exposure.
--solution 5 dynamic SQL
CREATE PROCEDURE uspGetAttributes
@SessionID BIGINT
,@AttributeIDs VARCHAR(1000)
AS
SET @AttributeIDs = LEFT(@AttributeIDs, DATALENGTH(@AttributeIDs)-1)
SET @AttributeIDs = REPLACE(@AttributeIDs, '''', '''''')
EXECUTE
('
SELECT
SessionID
,AttributeID
,Value
FROM
dbo.Sessions
WHERE
SessionID = ''' +
@SessionID + '''
AND AttributeID IN (' + @AttributeIDs
+ ')
');
GO
The final T-SQL option performs a counter-intuitive
reverse LIKE operation where the logic compares each AttributeID from the base
table to see if it is LIKE the list passed in. This works and is efficient but
depends on there being a relatively low volume of AttributeID values for each
SessionID. This is because it is not SARGable and so within the SessionID
values it must scan all AttributeID values. From the execution plan you can see
it is scanning a range looking for matches based on the predicate.
--solution 6 reverse LIKE
CREATE PROCEDURE uspGetAttributes
@SessionID BIGINT
,@AttributeIDs VARCHAR(1000)
AS
SET @AttributeIDs = ',' + @AttributeIDs + ',';
SELECT
SessionID
,AttributeID
,Value
FROM
dbo.Sessions
WHERE
SessionID = @SessionID
AND @AttributeIDs LIKE '%,' + CAST(AttributeID AS VARCHAR(20)) + ',%';
GO
If you are at .NET 4.0 for the application tier and SQL
Server 2008+ you can take advantage of Table-Valued Parameters (TVP). For this you create a table type object – essentially
just a schema for a table variable and then you can have the application load a
table object instead of a delimited list and pass that object to the stored
procedure. This is not a T-SQL solution to convert a delimited list to a table,
but I wanted to include it because it is an excellent solution to this type of
problem. It pushes the conversion to the application tier – or more than likely
would never use a delimited list to begin with. It also should be noted that
even at SQL Server 2005, it is possible to take a similar approach by passing
XML instead of the delimited list and then processing it in like manner to
solution 4.
--solution 7 table-valued parameter (TVP) - requires interface
change to sproc
CREATE TYPE [dbo].[AttributesType] AS TABLE(
[AttributeID]
[bigint] NOT NULL,
PRIMARY KEY CLUSTERED
(
[AttributeID]
ASC
)WITH (IGNORE_DUP_KEY
= OFF)
);
GO
CREATE PROCEDURE uspGetAttributes
@SessionID BIGINT
,@AttributeIDsTVP dbo.AttributesType READONLY
AS
SELECT
dbo.Sessions.SessionID
,dbo.Sessions.AttributeID
,dbo.Sessions.Value
FROM
dbo.Sessions
INNER JOIN @AttributeIDsTVP
AttributeIDsTVP
ON AttributeIDsTVP.AttributeID = dbo.Sessions.AttributeID
WHERE
SessionID
= @SessionID;
GO
Regarding which solution is best to implement, it depends
on the characteristics of your data. If you know that there will always be a
relatively low volume of values for the column associated with the delimited
list, and there is an index within which these can be scanned – e.g. low volume of AttributeIDs for each SessionID
– then the reverse LIKE works well and avoids some hits to the database by way
of index searches. Below is index search data for a 30 minute load test of the
various solutions. Reverse LIKE is lowest by a factor of 8 compared to all of
the solutions except CTE, and a factor of 34 for CTE. But again, this solution
should be considered only in cases where the characteristics of the data are
known to be compatible.
A further consideration is code isolation and re-use. The
numbers-table solution is well-suited to abstraction to an in-line table
UDF. This is a topic unto itself, but
for the present purpose it is sufficient to emphasize that the UDF must be
in-line so that it is materialized into the calling object and thereby
available to the optimizer. For a UDF to be in-line, it must be comprised of a
single query. Below is an example of
abstracting the Delimited-List-To-Table processing to an in-line UDF. You could
also implement this with the CTE and XML solutions, but not with the other
solutions including reverse LIKE.
CREATE FUNCTION dbo.ParseValues_InLine(
@AttributeIDs VARCHAR(1000)
) RETURNS TABLE
AS
RETURN
SELECT
SUBSTRING(',' + @AttributeIDs + ',',(Nums.number+1),CHARINDEX(',',',' + @AttributeIDs + ',',(Nums.number+1))-Nums.number-1) AS Item
FROM
[master].[dbo].[spt_values] Nums
WHERE
Type = 'P'
AND Nums.number BETWEEN 1 AND DATALENGTH(',' + @AttributeIDs + ',')
AND SUBSTRING(',' + @AttributeIDs + ',',Nums.number,1) = ','
AND CHARINDEX(',',',' + @AttributeIDs + ',',(Nums.number+1)) > Nums.number;
GO
--solution 3a - numbers table using in-line UDF
CREATE PROCEDURE uspGetAttributes
@SessionID BIGINT
,@AttributeIDs VARCHAR(1000)
AS
SELECT
SessionID
,AttributeID
,Value
FROM
dbo.Sessions
INNER JOIN dbo.ParseValues_InLine(@AttributeIDs) AttributeIDsTable
ON AttributeIDsTable.Item = dbo.Sessions.AttributeID
WHERE
SessionID
= @SessionID;
GO
Comparing the original numbers table solution to the
updated version using an in-line UDF, the execution plans are identical. So we
were able to abstract the Delimited-List-To-Table processing to a function
avoiding code duplications, and without loss of performance. Naturally, the UDF
could be expanded to pass a delimiter to make it more flexible.
Original stored procedure using the numbers table
solution
Updated stored procedure using the numbers table solution
within an in-line UDF
12/28/2012 Update:
I just read a great article by Jeff Moden which explains further tuning opportunities of the Tally Table solution avoiding scale issues resulting from the concationation of delimiters.
http://www.sqlservercentral.com/articles/Tally+Table/72993/
Based heavily on Moden's article this is a good encapsulated function for this need. It uses a materialized numbers table.
CREATE FUNCTION [dbo].[fnListToTable8K](
@List VARCHAR(8000),@Delimiter CHAR(1)
) RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
--this UNIONs a "1" (first element) with n+1 values (subsequent elements) relative to delimiter positions
--eliminates need to prepend a delimiter
WITH ElementStartCTE
AS
(
SELECT 1 AS Start
UNION ALL
SELECT
n+1
FROM
dbo.Numbers WITH (NOLOCK)
WHERE
n <= DATALENGTH(@List)
AND SUBSTRING(@List,n,1) = @Delimiter
)
--ISNULL\NULLIF logic handles last element eliminating need
to append delimiter
,ElementStartAndLengthCTEAS
(
SELECT
Start
,ISNULL(NULLIF(CHARINDEX(@Delimiter,@List,Start),0)-Start,DATALENGTH(@List)) AS Length
FROM
ElementStartCTE
)
SELECT
SUBSTRING(@List,Start,Length) AS ItemFROM
ElementStartAndLengthCTE;