Trim whitespace (spaces, tabs, newlines)

https://dba.stackexchange.com/questions/132966

30-09-2020
|

Question

I'm on SQL Server 2014 and I need to clean whitespace from start and end of a column's content, where whitespace could be simple spaces, tabs or newlines (both \n and \r\n); e.g.

'    this content    '                          should become 'this content'
'  \r\n   \t\t\t this \r\n content \t  \r\n   ' should become 'this \r\n content'

and so on.

I was able to only achieve the first case with

UPDATE table t SET t.column = LTRIM(RTRIM(t.column))

but for the other cases it doesn't work.

Solution

For anyone using SQL Server 2017 or newer

you can use the TRIM built-in function. For example:

DECLARE @Test NVARCHAR(4000);
SET @Test = N'  
    ' + NCHAR(0x09) + N'  ' + NCHAR(0x09) + N' this 
 ' + NCHAR(0x09) + NCHAR(0x09) + N'  content' + NCHAR(0x09) + NCHAR(0x09) + N'  
' + NCHAR(0x09) + N' ' + NCHAR(0x09) + NCHAR(0x09) + N'     ';

SELECT N'~'
        + TRIM(NCHAR(0x09) + NCHAR(0x20) + NCHAR(0x0D) + NCHAR(0x0A) FROM @Test)
        + N'~';

Please note that the default behavior of TRIM is to remove only spaces, so in order to also remove the tabs and newlines (CR + LFs), you need to specify the characters FROM clause.

Also, I used NCHAR(0x09) for the tab characters in the @Test variable so that the example code can be copied-and-pasted and retain the correct characters. Otherwise, tabs get converted into spaces when this page is rendered.

For anyone using SQL Server 2016 or older

You can create a function, either as a SQLCLR Scalar UDF or a T-SQL Inline TVF (iTVF). The T-SQL Inline TVF would be as follows:

CREATE
--ALTER
FUNCTION dbo.TrimChars(@OriginalString NVARCHAR(4000), @CharsToTrim NVARCHAR(50))
RETURNS TABLE
WITH SCHEMABINDING
AS RETURN
WITH cte AS
(
  SELECT PATINDEX(N'%[^' + @CharsToTrim + N']%', @OriginalString) AS [FirstChar],
         PATINDEX(N'%[^' + @CharsToTrim + N']%', REVERSE(@OriginalString)) AS [LastChar],
        LEN(@OriginalString + N'~') - 1 AS [ActualLength]
)
SELECT cte.[ActualLength],
       [FirstChar],
       ((cte.[ActualLength] - [LastChar]) + 1) AS [LastChar],
       SUBSTRING(@OriginalString, [FirstChar],
                 ((cte.[ActualLength] - [LastChar]) - [FirstChar] + 2)) AS [FixedString]
FROM   cte;
GO

And running it as follows:

DECLARE @Test NVARCHAR(4000);
SET @Test = N'  
    ' + NCHAR(0x09) + N'  ' + NCHAR(0x09) + N' this 
 ' + NCHAR(0x09) + NCHAR(0x09) + N'  content' + NCHAR(0x09) + NCHAR(0x09) + N'  
' + NCHAR(0x09) + N' ' + NCHAR(0x09) + NCHAR(0x09) + N'     ';

SELECT N'~' + tc.[FixedString] + N'~' AS [proof]
FROM   dbo.TrimChars(@Test, NCHAR(0x09) + NCHAR(0x20) + NCHAR(0x0D) + NCHAR(0x0A)) tc;

Returns:

proof
----
~this 
              content~

And you can use that in an UPDATE using CROSS APPLY:

UPDATE tbl
SET    tbl.[Column] = itvf.[FixedString]
FROM   SchemaName.TableName tbl
CROSS APPLY  dbo.TrimChars(tbl.[Column],
                           NCHAR(0x09) + NCHAR(0x20) + NCHAR(0x0D) + NCHAR(0x0A)) itvf

As mentioned at the beginning, this is also really easy via SQLCLR since .NET includes a Trim() method that does exactly the operation you are wanting. You can either code your own to call SqlString.Value.Trim(), or you can just install the Free version of the SQL# library (which I created, but this function is in the Free version) and use either String_Trim (which does just white space) or String_TrimChars where you pass in the characters to trim from both sides (just like the iTVF shown above).

DECLARE @Test NVARCHAR(4000);
SET @Test = N'  
    ' + NCHAR(0x09) + N'  ' + NCHAR(0x09) + N' this 
 ' + NCHAR(0x09) + NCHAR(0x09) + N'  content' + NCHAR(0x09) + NCHAR(0x09) + N'  
' + NCHAR(0x09) + N' ' + NCHAR(0x09) + NCHAR(0x09) + N'     ';

SELECT N'~' + SQL#.String_Trim(@Test) + N'~' AS [proof];

And it returns the exact same string as shown above in the iTVF example output. But being a scalar UDF, you would use it as follows in an UPDATE:

UPDATE tbl
SET    tbl.[Column] = SQL#.String_Trim(itvf.[Column])
FROM   SchemaName.TableName tbl

Either one of the above should be efficient for using across millions of rows. Inline TVFs are optimizible unlike Multi-statement TVFs and T-SQL scalar UDFs. And, SQLCLR Scalar UDFs have the potential to be used in parallel plans, as long as they are marked as IsDeterministic=true and do not set either type of DataAccess to Read (the default for both User and System data access is None), and both of those conditions are true for both of the SQLCLR functions noted above.

OTHER TIPS

You might want to consider using a TVF (table-valued-function) to remove the offending characters from the start and end of your data.

Create a table to hold test data:

IF COALESCE(OBJECT_ID('dbo.TrimTest'), 0) <> 0
BEGIN
    DROP TABLE dbo.TrimTest;
END
CREATE TABLE dbo.TrimTest
(
    SampleData VARCHAR(50) NOT NULL
);

INSERT INTO dbo.TrimTest (SampleData)
SELECT CHAR(13) + CHAR(10) + CHAR(9) + 'this is ' + CHAR(13) + CHAR(10) + ' a test' + CHAR(13) + CHAR(10);
GO

Create the TVF:

IF COALESCE(OBJECT_ID('dbo.StripCrLfTab'), 0) <> 0
BEGIN
    DROP FUNCTION dbo.StripCrLfTab;
END
GO
CREATE FUNCTION dbo.StripCrLfTab
(
    @val NVARCHAR(1000)
)
RETURNS @Results TABLE
(
    TrimmedVal NVARCHAR(1000) NULL
)
AS
BEGIN
    DECLARE @TrimmedVal NVARCHAR(1000);
    SET @TrimmedVal = CASE WHEN RIGHT(@val, 1) = CHAR(13) OR RIGHT(@val, 1) = CHAR(10) OR RIGHT(@val, 1) = CHAR(9)
            THEN LEFT(
                CASE WHEN LEFT(@val, 1) = CHAR(13) OR LEFT(@val, 1) = CHAR(10) OR LEFT(@val, 1) = CHAR(9)
                THEN RIGHT(@val, LEN(@val) - 1)
                ELSE @val
                END
                , LEN(@val) -1 )
            ELSE
                CASE WHEN LEFT(@val, 1) = CHAR(13) OR LEFT(@val, 1) = CHAR(10) OR LEFT(@val, 1) = CHAR(9)
                THEN RIGHT(@val, LEN(@val) - 1)
                ELSE @val
                END
            END;
    IF @TrimmedVal LIKE (CHAR(13) + '%')
        OR @TrimmedVal LIKE (CHAR(10) + '%')
        OR @TrimmedVal LIKE (CHAR(9) + '%')
        OR @TrimmedVal LIKE ('%' + CHAR(13))
        OR @TrimmedVal LIKE ('%' + CHAR(10))
        OR @TrimmedVal LIKE ('%' + CHAR(9))
        SELECT @TrimmedVal = tv.TrimmedVal
        FROM dbo.StripCrLfTab(@TrimmedVal) tv;
    INSERT INTO @Results (TrimmedVal)
    VALUES (@TrimmedVal);
    RETURN;
END;
GO

Run the TVF to show the results:

SELECT tt.SampleData
    , stt.TrimmedVal
FROM dbo.TrimTest tt
CROSS APPLY dbo.StripCrLfTab(tt.SampleData) stt;

Results:

The TVF calls itself recursively until there are no remaining offending characters at the start and end of the string passed into the function. This is unlikely to perform well over a large number of rows, but would probably work ok if you are using this to fix data as it is inserted into the database.

You could use this in an update statement:

UPDATE dbo.TrimTest
SET TrimTest.SampleData = stt.TrimmedVal
FROM dbo.TrimTest tt
CROSS APPLY dbo.StripCrLfTab(tt.SampleData) stt;


SELECT *
FROM dbo.TrimTest;

Results (as text):

I just had a problem with this particular situation, i needed to find and clean every field with white spaces, but i found 4 types of possibles white space in my database fields (Reference to ASCII code table):

Horizontal Tab(char(9))
New Line(char(10))
Vertical Tab(char(9))
Space(char(32))

Maybe this query can help you.

UPDATE @TABLE SET @COLUMN = replace(replace(replace(replace(@COLUMN,CHAR(9),''),CHAR(10),''),CHAR(13),''),CHAR(32),'')

You would have to parse the second example because LTRIM/RTRIM only trim spaces. You actually want to trim what SQL considers data (/r,/t, etc). If you know the values you are looking for, just use REPLACE to replace them. Better yet, write a function and call it.

If you like, use my elegant function:

CREATE FUNCTION s_Trim
(
    @s nvarchar(max)
)
RETURNS nvarchar(max)
AS
BEGIN
    -- Create comparators for LIKE operator
    DECLARE @whitespaces nvarchar(50) = CONCAT('[ ', CHAR(9), CHAR(10), CHAR(13), ']'); -- Concat chars that you consider as whitespaces
    DECLARE @leftComparator nvarchar(50) = @whitespaces + '%',
            @rightComparator nvarchar(50) = '%' + @whitespaces;
    -- LTRIM
    WHILE @s LIKE @leftComparator AND LEN(@s + 'x') > 1 SET @s = RIGHT(@s, LEN(@s + 'x') - 2)
    -- RTRIM
    WHILE @s LIKE @rightComparator AND LEN(@s + 'x') > 1 SET @s = LEFT(@s, LEN(@s + 'x') - 2)

    RETURN @s;
END
GO

Using function on large data can take long execution time. I have a dataset of 8million rows, usingfunction took more than 30 minutes to execute. replace(replace(replace(replace(@COLUMN,CHAR(9),''),CHAR(10),''),CHAR(13),''),CHAR(32),'') took only 5 sec. Thanks all. I see you @sami.almasagedi and @Colin 't Hart

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange