Question

What is the best way to replace all '&lt' with < in a given database column? Basically perform s/&lt[^;]/</gi

Notes:

  • must work in MS SQL Server 2000
  • Must be repeatable (and not end up with <;;;;;;;;;)
Was it helpful?

Solution

Some hacking required but we can do this with LIKE, PATINDEX, LEFT AND RIGHT and good old string concatenation.

create table test
(
    id int identity(1, 1) not null,
    val varchar(25) not null
)

insert into test values ('&lt; <- ok, &lt <- nok')

while 1 = 1
begin
    update test
        set val = left(val, patindex('%&lt[^;]%', val) - 1) +
                      '&lt;' +
                      right(val, len(val) - patindex('%&lt[^;]%', val) - 2)
    from test
    where val like '%&lt[^;]%'

    IF @@ROWCOUNT = 0 BREAK
end

select * from test

Better is that this is SQL Server version agnostic and should work just fine.

OTHER TIPS

I think this can be done much cleaner if you use different STUFF :)

create table test
(
    id int identity(1, 1) not null,
    val varchar(25) not null
)

insert into test values ('&lt; <- ok, &lt <- nok')

WHILE 1 = 1
BEGIN
    UPDATE test SET
        val = STUFF( val , PATINDEX('%&lt[^;]%', val) + 3 , 0 , ';' )
    FROM test
    WHERE val LIKE '%&lt[^;]%'

    IF @@ROWCOUNT = 0 BREAK
END

select * from test

How about:

    UPDATE tableName
    SET columName = REPLACE(columName , '&lt', '&lt;')
    WHERE columnName LIKE '%lt%'
    AND columnName NOT LIKE '%lt;%'

Edit:

I just realized this will ignore columns with partially correct &lt; strings.

In that case you can ignore the second part of the where clause and call this afterward:

    UPDATE tableName
    SET columName = REPLACE(columName , '&lt;;', '&lt;')

This article covers how to create a simple Regex Replace function that you can use in SQL 2000 (and 2005 with simple tweak) that can assist you.

If MSSQL's regex flavor supports negative lookahead, that would be The Right Way to approach this.

s/&lt(?!;)/&lt;/gi

will catch all instances of &lt which are not followed by a ; (even if they're followed by nothing, which [^;] would miss) and does not capture the following non-; character as part of the match, eliminating the issue mentioned in the comments on the original question of that character being lost in the replacement.

Unfortunately, I don't use MSSQL, so I have no idea whether it supports negative lookahead or not...

Very specific to this pattern, but I have done similar to this in the past:

REPLACE(REPLACE(columName, '&lt;', '&lt'), '&lt', '&lt;')

broader example (encode characters which may be inappropriate in a TITLE attribute)

REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
    columName
    -- Remove existing encoding:
    , '&amp;', '&')
    , '&#34;', '"')
    , '&#39;', '''')
    -- Reinstate/Encode:
    , '&', '&amp;')
    -- Encode:
    , '"', '&#34;')
    , '''', '&#39;')
    , ' ', '%20')
    , '<', '%3C')
    , '>', '%3E')
    , '/', '%2F')
    , '\', '%5C')
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top