سؤال

Disclaimer: not my code, not my database design!

I have a column of censusblocks(varbinary(max), null) in a MS SQL Server 2008 db table (call it foo for simplicity).

This column is actually a null or 1 to n long list of int. The ints are actually foreign keys to another table (call it censusblock with a pk id of type of int), numbering from 1 to ~9600000.

I want to query to extract the censusblocks list from foo, and use the extracted list of int from each row to look up the corresponding censusblock row. There's a long, boring rest of the query that will be used from there, but it needs to start with the census blocks pulled from the foo table's censusblocks column.

This conversion-and-look-up is currently handled on the middle tier, with a small .NET utility class to convert from List<int> to byte[] (and vice versa), which is then written into/read from the db as varbinary. I would like to do the same thing, purely in SQL.

The desired query would go something along the lines of

SELECT f.id, c.id
FROM foo f 
LEFT OUTER JOIN censusblock c ON 
c.id IN f.censusblocks --this is where the magic happens
where f.id in (1,2)

Which would result in:

f.id   |   c.id

 1         8437314
 1         8438819
 1         8439744
 1         8441795
 1         8442741
 1         8444984
 1         8445568
 1         8445641
 1         8447953
 2         5860657
 2         5866881
 2         5866881
 2         5866858
 2         5862557
 2         5870475
 2         5868983
 2         5865207
 2         5863465
 2         5867301
 2         5864057
 2         5862256

NB: the 7-digit results are coincidental. The range is, as stated above, 1-7 digits.

The actual censusblocks column looks like

SELECT TOP 2 censusblocks FROM foo

which results in

censublocks

0x80BE4280C42380C7C080CFC380D37580DC3880DE8080DEC980E7D1
0x596D3159858159856A59749D59938B598DB7597EF7597829598725597A79597370

For further clarification, here's the guts of the .NET utility classes conversion methods:

    public static List<int> getIntegersFromBytes(byte[] data)
    {
        List<int> values = new List<int>();
        if (data != null && data.Length > 2)
        {
            long ids = data.Length / 3;
            byte[] oneId = new byte[4];
            oneId[0] = 0;
            for (long i = 0; i < ids; i++)
            {
                oneId[0] = 0;
                Array.Copy(data, i * 3, oneId, 1, 3);
                if (BitConverter.IsLittleEndian)
                { Array.Reverse(oneId); }
                values.Add(BitConverter.ToInt32(oneId, 0));
            }}
        return values;
    }

    public static byte[] getBytesFromIntegers(List<int> values)
    {
        byte[] data = null;
        if (values != null && values.Count > 0)
        {
            data = new byte[values.Count * 3];
            int count = 0;
            byte[] idBytes = null;
            foreach (int id in values)
            {
                idBytes = BitConverter.GetBytes(id);
                if (BitConverter.IsLittleEndian)
                { Array.Reverse(idBytes); }
                Array.Copy(idBytes, 1, data, count * 3, 3);
                count++;
            } }
        return data;
    }
هل كانت مفيدة؟

المحلول

An example of how this might be done. It is unlikely to scale brilliantly. If you have a numbers table in your database it should be used in place of nums_cte.

This works by converting the binary value to a literal hex string, then reading it in 8-character chunks

-- create test data
DECLARE @foo TABLE
(id int ,
 censusblocks varbinary(max)
)

DECLARE @censusblock TABLE
(id int)

INSERT @censusblock (id)
VALUES(1),(2),(1003),(5030),(5031),(2),(6)


INSERT @foo (id,censusblocks)
VALUES (1,0x0000000100000002000003EB),
(2,0x000013A6000013A7)


--query
DECLARE @biMaxLen bigint
SELECT @biMaxLen = MAX(LEN(CONVERT(varchar(max),censusblocks,2))) FROM @foo

;with nums_cte
AS
(
    SELECT TOP (@biMaxLen) ((ROW_NUMBER() OVER (ORDER BY a.type) - 1) * 8) AS n
    FROM master..spt_values as a
    CROSS JOIN master..spt_values as b
)
,binCTE
AS
(
    SELECT d.id, CAST(CONVERT(binary(4),SUBSTRING(s,n + 1,8),2) AS int) as cblock
    FROM (SELECT Id, CONVERT(varchar(max),censusblocks,2) AS s FROM @foo) AS d
    JOIN nums_cte
    ON n < LEN(d.s)
)
SELECT *
FROM    binCTE as b
LEFT
JOIN    @censusblock c
ON      c.id = b.cblock
ORDER BY b.id, b.cblock

You could also consider adding your existing .Net conversion methods into the database as an assembly and accessing them through CLR functions.

نصائح أخرى

This is off-topic, but I couldn't resist writing these conversions so they use IEnumerables instead of arrays and Lists. This might not be faster per se, but is more general and would allow you to perform the conversion without loading the whole array at once, which may be helpful if the arrays you are dealing with are large.

Here it is, for what it's worth:

static IEnumerable<int> BytesToInts(IEnumerable<byte> bytes) {

    var buff = new byte[4];

    using (var en = bytes.GetEnumerator()) {

        while (en.MoveNext()) {

            buff[0] = en.Current;
            if (en.MoveNext()) {
                buff[1] = en.Current;
                if (en.MoveNext()) {
                    buff[2] = en.Current;
                    if (en.MoveNext()) {
                        buff[3] = en.Current;
                        if (BitConverter.IsLittleEndian)
                            Array.Reverse(buff);
                        yield return BitConverter.ToInt32(buff, 0);
                        continue;
                    }
                }
            }

            throw new ArgumentException("Wrong number of bytes.", "bytes");

        }

    }

}

static IEnumerable<byte> IntsToBytes(IEnumerable<int> ints) {

    if (BitConverter.IsLittleEndian)
        return ints.SelectMany(
            b => {
                var buff = BitConverter.GetBytes(b);
                Array.Reverse(buff);
                return buff;
            }
        );

    return ints.SelectMany(BitConverter.GetBytes);

}

Your code seems to like encoding an int into 3 bytes instead of 4, which would cause problems with values that don't fit into 3 bytes (including negatives) - is that intentional?

BTW, you should be able to adapt this (or your) code for execution in SQL Server CLR. This is not exactly "in SQL", but is "in DBMS".

you can use Convert(int, censusBlock) to convert the varchar value to int value.
the you can join on that column. Or have i misunderstood the question?

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top