Question

I wonder if there is an easy and efficient way in SQL Server 2005 to eliminate replicated characters in a string. Like converting

'ABBBCDEEFFFFG' to 'ABCDEFG'

It really sucks that SQL Server has such a poor string library and no ready-to-use regexp feature...

No correct solution

OTHER TIPS

You can use the CLR functionality built into SQL Server 2005/2008 to get this done by .NET code.

MSDN magazine wrote about it in their February 2007 issue.

If this is not an acceptable solution, here is a UDF that will do the same, mind you this is about two orders of magnitude slower than the CLR solution.

YMMV. This appears to work for your string above. But not ABBBCDEEBBBBG

DECLARE @Numbers TABLE (Num smallint NOT NULL PRIMARY KEY)

INSERT @Numbers (Num)
SELECT TOP 8000
    ROW_NUMBER() OVER (ORDER BY c1.NAME)
FROM
    sys.columns c1

DECLARE @STuff TABLE (Seq varchar(100) NOT NULL PRIMARY KEY)

INSERT @STuff (Seq) VALUES ('ABBBCDEEFFFFG') --works

SELECT
    Single
FROM
    (
    SELECT DISTINCT
        CAST(Single AS varchar(100))
    FROM
       @Numbers N
       CROSS APPLY 
       (SELECT Seq, SUBSTRING(Seq, Num, 1) AS Single FROM @Stuff) S
    WHERE
       Num <= LEN(Seq)
    FOR XML PATH ('')
    ) foo(Single)

I know about the CLR solution, but as I said, I am neither responsible nor authorized to implement it in the DB of question.

For this particular problem, I decided to write a very simple and kinda silly loop. I am afraid it won't be fast enough for millions of records, but anyways... I wish I could do this stuff in the application layer but I am bound to T-SQL here..

  DECLARE @i int ;  -- counter
  DECLARE @input varchar(200) ;
  SET @input = 'AAABCDEEFFBBBXYZSSSWWWNT'

  IF LEN(@input) > 1
  BEGIN
    DECLARE @unduplicated varchar(200) ;
    SET @unduplicated = SUBSTRING(@input,1,1) ;

    SET @i = 2 ;
    WHILE @i <= LEN(@input)
    BEGIN
      -- If current char is different from the last char, concatenate, else not
      IF SUBSTRING(@unduplicated, LEN(@unduplicated), 1) <> SUBSTRING(@input, @i, 1)
        SET @unduplicated = @unduplicated + SUBSTRING(@input, @i, 1) ;
      SET @i = @i + 1;
    END
  END  
  SELECT @unduplicated AS unduplicated;

Result:

unduplicated 
ABCDEFBXYZSWNT
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top