Question

I'm using this function to decode url encoded string :

ALTER FUNCTION [dbo].[UrlDecode](@url varchar(3072)) 
RETURNS varchar(3072) 
AS 
BEGIN 
    DECLARE @Position INT,
        @Base CHAR(16),
        @High TINYINT,
        @Low TINYINT,
        @Pattern CHAR(21)

    SELECT  @Base = '0123456789abcdef',
        @Pattern = '%[%][0-9a-f][0-9a-f]%',
        @URL = REPLACE(@URL, '+', ' '),
        @Position = PATINDEX(@Pattern, @URL)

    WHILE @Position > 0
        SELECT  @High = CHARINDEX(SUBSTRING(@URL, @Position + 1, 1), @Base COLLATE Latin1_General_CI_AS),
            @Low = CHARINDEX(SUBSTRING(@URL, @Position + 2, 1), @Base COLLATE Latin1_General_CI_AS),
            @URL = STUFF(@URL, @Position, 3, CHAR(16 * @High + @Low - 17)),
            @Position = PATINDEX(@Pattern, @URL)

    RETURN  @URL + 
END 

This works fine until it reaches special out of ascii scope characters. Example : Wil+SG+1 will return Wil SG 1 which is OK. While Gen%C3%A8ve+11 returns Genève 11 which is not what I expect (Genève 11 is the expected result in this case).

another example :

select 'Gen%C3%A8ve+2+D%C3%A9p%C3%B4t', dbo.UrlDecode('Gen%C3%A8ve+2+D%C3%A9p%C3%B4t')

returns :

Gen%C3%A8ve+2+D%C3%A9p%C3%B4t   Genève 2 Dépôt

I've tried using NCHAR instead of CHAR but the result is the same. Do you know what I can do to support these extended ascii chars ?

Was it helpful?

Solution 2

I found this function that achieve exactly what I want :

ALTER FUNCTION [dbo].[UrlDecodeUTF8](@URL varchar(3072))
RETURNS varchar(3072)
AS
BEGIN 
    DECLARE @Position INT,
        @Base CHAR(16),
        @Code INT,
        @Pattern CHAR(21)

    SELECT @URL = REPLACE(@URL, '%c3', '')

    SELECT  @Base = '0123456789abcdef',
        @Pattern = '%[%][0-9a-f][0-9a-f]%',
        @Position = PATINDEX(@Pattern, @URL)

    WHILE @Position > 0
        SELECT @Code = Cast(CONVERT(varbinary(4), '0x' + SUBSTRING(@URL, @Position + 1, 2), 1) As int),
            @URL = STUFF(@URL, @Position, 3, NCHAR(@Code + 64)),
            @Position = PATINDEX(@Pattern, @URL)

    RETURN REPLACE(@URL, '+', ' ')

END

OTHER TIPS

URLs are encoded in UTF-8. What your function does is simply replace the hex codes of the UTF-8 representation of the URL with the characters matching the hex codes.

What you really need is a function to replace URL-encoded UTF-8 to MSSQL UCS-2, as posted in this answer on Social.MSDN.

I suspect you'll need to play around with the collation, replacing UTF codes with some ascii equivalent. Here's an example I had in my code library:

REPLACE(CHAR(228) COLLATE Latin1_General_BIN, CHAR(196), 'Y')
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top