Tuesday, January 28, 2020

Remove Repeating Characters From a String

Below is a solution to remove a random number of repeating characters from a string. I created a string with a randomly repeating letter q. Some real world needs are for removing repeating spaces. I did not create this solution but wanted to document and explain it in this BLOG for the benefit of visitors.This is a single-pass solution for multiple and variable repeating characters within the string.


--string with repeating characters to remove
DECLARE @String VARCHAR(1000) = 'My qqquery was bad qquality and returned data with randomly repeating qq letters and so we needed a way to reduce multiple qqqq''s to a single qqqqqqqqqqq.';

--repeating character to remove
DECLARE @TrimCharacter VARCHAR(1) = 'q';

SELECT
@String AS OriginalString
,REPLACE(@String,@TrimCharacter,'<>') AS FirstReplace
,REPLACE(REPLACE(@String,@TrimCharacter,'<>'),'><','') AS SecondReplace
,REPLACE(REPLACE(REPLACE(@String,@TrimCharacter,'<>'),'><',''),'<>',@TrimCharacter) AS FinalResult;



ORIGINAL STRING – Green highlights of all 'q' letters
My qqquery was bad qquality and returned data with randomly repeating qq letters and so we needed a way to reduce multiple qqqq's to a single qqqqqqqqqqq.

FIRST INNER REPLACE – Replaces each 'q' (green above) with '<>'
My <><><>uery was bad <><>uality and returned data with randomly repeating <><> letters and so we needed a way to reduce multiple <><><><>'s to a single <><><><><><><><><><><>.

SECOND MIDDLE REPLACE – Replaces each '><' (yellow above) with empty string
My <>uery was bad <>uality and returned data with randomly repeating <> letters and so we needed a way to reduce multiple <>'s to a single <>.

THIRD OUTER REPLACE - Replaces each remaining '<>' (aqua above) with a 'q'
My query was bad quality and returned data with randomly repeating q letters and so we needed a way to reduce multiple q's to a single q.


If the string happens to contain either '<' or '>' the solution would need to be modified so as to avoid the conflicts. Instead of '<>' a generic solution can use the unprintable characters: CHAR(17)+CHAR(18)

REPLACE(REPLACE(REPLACE(@String,@TrimCharacter,CHAR(17)+CHAR(18)),CHAR(18)+CHAR(17),''),CHAR(17)+CHAR(18),@TrimCharacter) AS FinalResult






Thursday, January 23, 2020

Data Islands for Start Date and End Date Items

Below is a solution for Data Islands of Items with overlapping Start Date and End Date adapted from this article by Itzik Ben-Gan

WITH StartIslandCTE
AS
(
SELECT
ID
,Task
,StartDate
,EndDate
,IIF(StartDate BETWEEN LAG(StartDate) OVER (ORDER BY StartDate, EndDate) AND LAG(EndDate) OVER (ORDER BY StartDate, EndDate),0,1) AS StartGroup
FROM
dbo.Tasks
)

,IslandIDsCTE
AS
(
SELECT
ID
,Task
,StartDate
,EndDate
,StartGroup
,SUM(StartGroup) OVER (ORDER BY StartDate, EndDate ROWS UNBOUNDED PRECEDING) AS GroupID
FROM
StartIslandCTE
)

,ReportDataCTE
AS
(
SELECT
GroupID
,MIN(TaskApply.TaskList) AS TaskList
,MIN(StartDate) AS GroupStartDate
,MAX(EndDate) AS GroupEndDate
FROM
IslandIDsCTE
CROSS APPLY
(
SELECT
STUFF((

--query with column to make a delimited string
SELECT
',' + Task
FROM 
IslandIDsCTE tasks
WHERE
tasks.GroupID = IslandIDsCTE.GroupID
ORDER BY
tasks.Task
FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 1, '')

--column name of delimited string
AS TaskList
) TaskApply
GROUP BY
GroupID
)

SELECT
TaskList
,GroupStartDate
,GroupEndDate
FROM
ReportDataCTE
ORDER BY
GroupStartDate
,GroupEndDate;


Sample Data 















Result looks like this





Friday, September 6, 2019

Lightweight SQL Server Health Monitoring & Visualization


Even with the introduction of Query Store in SQL Server 2016 and with the introduction of automatic tuning in Azure SQL database, there is benefit to capture and retain diagnostic performance metrics. This post explains a lightweight way to use well-known DMV queries to capture key SQL Server performance data for alerting and long-term reporting. It involves creating a database on the server(s) to be monitored. The script for the database is here.

This system captures data for (1) weight statistics, (2) SQL I/O and (3) stored procedure performance. Below are a few reports to illustrate what sort of visualizations are possible with this approach. After that, I provide a brief explanation of how it works including everything you need to try it out yourself.

(1)  Wait Statistics showing by date - wait task/sec (gray bars), avg resource wait time (blue) and avg signal wait time (green). For more information on wait statistics, visit this article by Paul Randal.




(2)  SQL I/O showing by date - read/write counts (bars), avg write wait time (blue) and avg read wait time (orange).






(3)  Stored Procedure Performance – 3 samples below
(a)  Graph showing sproc execution rate (gray bars) and tuning benefit reducing CPU and Wall Clock from 20ms to 5ms




(b)  Two graphs (CPU/Wall Clock & Logical Reads) showing sproc degradation (resulting from uplift of compatibility to SQL 2014) and benefit after resolution. Weekly subscription of these graphs provided quick alerts and timely resolution, also with the visibility to quantify/confirm the benefit.








(c)  Sample illustrating performance degradation and long-term benefit of resolution.




These three metrics (wait stats, i/o, sproc performance) have corresponding tables in the database each joining to common dimension tables. Below is a diagram of the sproc performance tables.



SQL Server dynamic management views (DMVs) which are the source of this data are cumulative and so capturing data for a slice of time involves taking a data sample, waiting some time, taking a 2nd data sample and then calculating deltas. In the database there is a sproc named uspCaptureMetrics which does all of this this. For wait stats and i/o, it captures the two data samples with a 10 min interval in between. Due to complexities of sproc (re)compilation and procedure cache, sproc performance works by taking 120 samples waiting 5 sec in between each sample. Then calculates the performance deltas by sproc for the greatest delta duration. 

uspCaptureMetrics takes 10 min to run with almost all of the time being WIATFOR in between execution of DMV queries. It is very lightweight. Recommendation is to create an agent job to execute uspCaptureMetrics every 15 minutes resulting in the ability to report down to the quarter-hour, making detailed visualizations like the one below possible – showing a CPU/exec reduction specifically between 5:00pm and 5:15pm on 8/1/2018. Likewise, this level of detail for a performance degradation greatly helps identify root cause.




To retain data long-term, you can implement a nightly ETL to a warehouse. This enables identifying longer term cyclical patterns and also makes possible long-term visualizations emphasizing for management the on-going benefits of specific tuning actions.

Running this system in development and test environments provides for a shift-left to identify and resolve performance issues before going to production.

The visualizations are reports in SSRS. That is beyond the scope of this post. Once you have the data, writing queries and reports is a necessary but not difficult task for someone with that skill set. 

Query Store and Automatic Tuning have their place but as a complement to these tools, capturing and retaining system performance metrics provides invaluable insights.

Hope this is of some use.




Friday, August 24, 2018

QUOTED_IDENTIFIER And PERSISTED Computed Columns

Many of us are familiar with this error message:

INSERT failed because the following SET options have incorrect settings: 'QUOTED_IDENTIFIER'. Verify that SET options are correct for use with indexed views and/or indexes on computed columns and/or filtered indexes and/or query notifications and/or XML data type methods and/or spatial index operations.

What is missing in the list however is PERSISTED Computed Columns. Below are two examples with the only difference being whether the computed column is persisted.

Example 1 - Creates a table with a non-persisted computed column. QUOTED_IDENTIFIER is ON for table create.

But we are able to INSERT (and SELECT) data with QUOTED_IDENTIFIER OFF.


DROP TABLE IF EXISTS [dbo].[Numbers];
GO

SET QUOTED_IDENTIFIER ON;
GO

CREATE TABLE [dbo].[Numbers](
       [ID] [int] NOT NULL,
       [Number1] [int] NOT NULL,
       [Number2] [int] NOT NULL,
       [SumOfNumbers]  AS ([Number1]+[Number2]),
 CONSTRAINT [PK_Numbers] PRIMARY KEY CLUSTERED
(
       [ID] ASC
));
GO

SET QUOTED_IDENTIFIER OFF;
GO

INSERT
INTO dbo.Numbers (ID,Number1,Number2)
VALUES (1,5,6);

SELECT * FROM dbo.Numbers;


Results...






Example 2 - Creates a table with a persisted computed column. QUOTED_IDENTIFIER is ON for table create.

But we are not able to INSERT data with QUOTED_IDENTIFIER OFF.


DROP TABLE IF EXISTS [dbo].[Numbers] ;

GO

SET QUOTED_IDENTIFIER ON;
GO

CREATE TABLE [dbo].[Numbers](
       [ID] [int] NOT NULL,
       [Number1] [int] NOT NULL,
       [Number2] [int] NOT NULL,
       [SumOfNumbers]  AS ([Number1]+[Number2]) PERSISTED,
 CONSTRAINT [PK_Numbers] PRIMARY KEY CLUSTERED
(
       [ID] ASC
));
GO

SET QUOTED_IDENTIFIER OFF;
GO

INSERT
INTO dbo.Numbers (ID,Number1,Number2)
VALUES (1,5,6);

SELECT * FROM dbo.Numbers


Results...

INSERT failed because the following SET options have incorrect settings: 'QUOTED_IDENTIFIER'. Verify that SET options are correct for use with indexed views and/or indexes on computed columns and/or filtered indexes and/or query notifications and/or XML data type methods and/or spatial index operations.

...with no mention of persisted computed columns. It mentions indexed computed columns and persistence is a prerequisite to index a computed column. But this example only persisted the computed column and did not index it. 

If you change  QUOTED_IDENTIFIER to ON, it works even when the computed column is persisted. 

I found some old documentation here, but nothing current. 

Anyway, it took me some time to figure it out and so I decided to post.

Later...

Monday, March 5, 2018

OPEN SYMMETRIC KEY & the Plan Cache


SQL Server natively supports column-level encryption. You can visit here to learn how to do it. But the basics are:

OPEN SYMMETRIC KEY
Do encryption and/or decryption stuff
CLOSE SYMMETRIC KEY

In order to use OPEN SYMMETRIC KEY, you have to specify either a password or a certificate which protects the symmetric key. If you use a password, it is clear text in the sproc code and so SQL Server does not put the sproc in the plan cache. No execution plan reuse and no usage & performance data in sys.dm_exec_procedure_stats DMV.

Thing is… even if, instead of a password, you use a certificate which presents no security exposure, SQL Server still does not put the sproc in the plan cache. If it sees OPEN SYMMETRIC KEY, no plan cache.

Guilt by association.  

SQL Server 2008 introduced another option for decryption only – DECRYPTBYKEYAUTOCERT. This command accepts the certificate name as a parameter and works on its own without OPEN SYMMETRIC KEY and so the sproc can go in the plan cache. Currently, there is no corresponding ENCRYPYBYKEYAUTOCERT.

Here is the big picture:



So why not a corresponding ENCRYPTBYKEYAUTOCERT?

I asked Microsoft at SQL PASS 2017. They took the question back to the mother ship and here is the email reply I received.


Hi Regan,

So the reason we support decryptByAutoCert but not the encryption part is twofold:

1.      Performance: We had received a steady flow of complaints that select operations with decryption occur frequently and should be made as performant as possible; hence we invested in multi-threading of decryption.
2.      Unlike decryption, which is deterministic and can pretty much execute in parallel without much code change, encryption is much harder to parallelize and even harder to test and ensure it works correctly.

So unfortunately, we have not funded parallelizing the encryption operation yet.
Thanks for bringing this to the forefront again. I will put this on our list for planning items.

There is a Connect UserVoice item here. Please vote!

In the meantime, there are two work-a-rounds.

(1)   To at least get the sproc in the DMVs so you can monitor usage & performance, move the actual T-SQL code to a “private” sproc and have the application-called sproc just be a wrapper to call the “private” version. This does not get the real code into the plan cache; it still has to be compiled on every execution. But at least you can get all the metrics from the sys.dm_exec_procedure_stats DMV for the wrapper sproc.
(2)   This 2010 MSDN blog post suggests using EXECUTE to fire dynamic SQL for the OPEN SYMMETRIC KEY command (whether using password or certificate). This enables the sproc to go in the plan cache. You still get a SP:CacheMiss for the OPEN SYMMETRIC KEY - but at least now it is only for that one command. 

We do what we can.

Later



Thursday, February 2, 2017

NULLs and the T-SQL EXCEPT Clause

NULL is the absence of a value. This means it is an unknown value. But it is not an unknown value like “x” is an unknown value in algebra. This is because “x” is an actual value and so x = x. NULL is not only unknown; it is the absence of a value. Consequently, NULL does not equal NULL; yet at the same time, any calculation involving NULL is equal to NULL.

So…
2 + 2 + x is equal to 4 + x

But…
2 + 2 + NULL is not equal to 4 + NULL

And T-SQL functions align to this.
2 + 2 = 4
2 + 2 != 5
And anything with a NULL in it is Not Equal. 


All is well in the world so far.


Enter EXCEPT which outputs distinct rows from a 1st dataset unmatched by any row in a 2nd dataset. A simple example is below. 2 is in the top dataset but not in the bottom set and so it is returned as a result of the EXCEPT clause. Even though the top dataset has two 2s only one 2 is returned because the EXCEPT clause inherently produces a distinct result.


We will replace the 2s in the top dataset with NULLs. Notice that although T-SQL functions treat NULLs as not equal to each other, EXCEPT returns a single NULL. Both NULLs are taken to be "equal". 



This conforms to the general functionality of DISTINCT.



Finally, moving one of the NULLs to the bottom set, the result of the EXCEPT is empty – thereby treating NULL in the top set (which could be 2+2) as matching NULL in the bottom set (which could be 5).


The MSDN article goes so far as to describe the two NULL values as “equal” although the SQL standard uses the language “IS NOT DISTINCT FROM”.


So although T-SQL functions treat NULLs as not equal to each other, the T-SQL EXCEPT clause treats NULLs as equal. In the case below, each SELECT separately conforms to the logic that anything plus a missing value nets a missing value, but then treats those resulting missing values as matching. The result of the EXCEPT clause is empty.



Might 2 + 2 actually equal 5?

Happy trails....


Monday, August 1, 2016

SELECT Clause Aliases Affect ORDER BY clause

If a SELECT clause transforms a column, and then aliases the result as the same name of the original column in the base table, the ORDER BY clause is affected and cannot make use of the sorted data in the base table. The SELECT clause alias affects the ORDER BY clause.

For an example, I am using a simple two-column table with an auto-increment Clustered PK and string column.

  



CASTing the PK as a VARCHAR, aliasing it as a different name (n1) with ORDER BY the base table PK column performs an Index Seek and leverages the base table sort.



But, CASTing the PK as a VARCHAR, aliasing it as the base table column name again with ORDER BY the base table PK column performs an Index Seek but must sort the data – because the data actually is sorted as a VARCHAR

 


The column 'n' has its meaning changed between execution of the WHERE clause and execution of the ORDER BY clause. The transformed data is what is being ORDER'd. 



To return transformed data - named the same as the base table column and ordered as it is in the base table PK (which was a weird requirement I recently had), you need to fully qualify the ORDER BY column reference.