Manipulating a Column Containing Key/Value Pairs
-
14-02-2021 - |
Question
I'm accessing and creating reports from a vendor via a replicated SQL Server database. They've done some absolutely insane things that I've been trying to solve for, but this one takes the cake.
They have a table that has many standard columns. But this table also has a column called "Data". The column is a legacy "text" data type, and it contains a giant (hundreds) list of key/value pairs. Each pair is separated by a CRLF, and the key and value are separated by an equal sign. Example:
select myTable.[data] from myTable where tblKey = 123
Result:
Key 1=Value 1
Key 2=Value 2
Key 3=Value 3
...
Key 500=Value 500
I'm trying to determine the most efficient way to break that column out into a usable table of data. The end goal would be to be able to query the table in a way that returns the table key along with specified key/values as column/fields as such:
tblKey | [Key 1] | [Key 3] | [Key 243]
-------|---------|---------|-----------
123 Value 1 Value 3 Value 243
124 Value 1 Value 3 Value 243
125 Value 1 Value 3 Value 243
Is there a way to mold that column into a View? I can't imagine that a Function would be particularly efficient, but I'm sure I could parse things out that way using a string_split or something of that sort. Has anyone run into this type of atrocity before and found a good way to manipulate it into usable data?
Edit to add dbfiddle sample data.
The data is replicated from a vendor's source, so I can't create new tables. I can create views, procedures and functions. That's what I'm looking for advice for a decent way to accomplish.
Solution
UPDATE
If as you posted in your own answer you're able to use a UDF to get specific key values let me suggest this one: (You don't need to split all key/values and you don't need to read the table again, you can get it by using text functions.)
CREATE FUNCTION fnGetKey(@Data text, @Key varchar(20))
RETURNS varchar(100)
AS
BEGIN
RETURN
(
SELECT
SUBSTRING (
@Data,
/* Position of first '=' after key + 1 */
CHARINDEX('=', @Data, PATINDEX('%' + @key + '%', @Data)) + 1,
/* Lenght, Position of first chr(13) after key less previuos value - 1 */
(CHARINDEX(CHAR(13), @Data, PATINDEX('%' + @key + '%', @Data))
-
CHARINDEX('=', @Data, PATINDEX('%' + @key + '%', @Data))) - 1
)
)
END
SELECT
FruitID, Name, Description,
dbo.fnGetKey([Data], 'key 2') as [key 2],
dbo.fnGetKey([Data], 'key 4') as [key 4]
FROM
[Fruit];
FruitID | Name | Description | key 2 | key 4 ------: | :----- | :---------- | :------ | :------ 1 | Banana | Delicious | value 2 | value 4 2 | Pear | Rotton | value 2 | value 4 3 | Kiwi | Okay | value 2 | value 4
db<>fiddle here
Original answer
The only solution I can figure out is by splitting key/values and then pivot it to obtain the desired result.
Unfortunately there are some inconveniences:
- STRING_SPLIT doesn't works with
text
columns. Hence you must cast it tovarchar
before you are able to manipulate it. - STRING_SPLIT requires a
nchar(1)
ornvarchar(1)
, ergo you should replaceCHAR(3)+CHAR(10)
by a single character. - Aggregate function on PIVOT works better with numeric values, then you should cast
Value
to some numeric data type. - PIVOT needs a well-known number of columns, in my example I have used a few of them but you should write the whole sequence unless you'd rather deal with dynamic queries.
This is what I've got using your sample data:
WITH KP AS
(
SELECT FruitID, Name, Description, value as KPair
FROM Fruit
CROSS APPLY STRING_SPLIT(REPLACE(CAST(Data AS varchar(max)), CHAR(13)+CHAR(10), ','), ',') /* STRING_SPLIT only allows nchar(1), varchar(1) */
)
, KP1 AS
(
SELECT
FruitID,
SUBSTRING(KPair, 5, CHARINDEX('=', KPair) - 5) AS [Key],
SUBSTRING(KPair, CHARINDEX('=', KPair) + 7, LEN(KPair) - CHARINDEX('=', KPair) - 6) AS [Value]
FROM
KP
)
SELECT [FruitID], [1],[2],[3],[4],[5]
FROM KP1
PIVOT (MAX([Value]) FOR [Key] IN ([1],[2],[3],[4],[5])) AS PVT;
First CTE split every Key X=Value Y
. The second one cut this value to obtain each [Key] and [Value]. And the final PIVOT compose the final result in columns.
FruitID | 1 | 2 | 3 | 4 | 5 ------: | :- | :- | :- | :- | :- 1 | 1 | 2 | 3 | 4 | 5 2 | 1 | 2 | 3 | 4 | 5 3 | 1 | 2 | 3 | 4 | 5
db<>fiddle here
NOTE: I'm not sure if I should maintain [Key 1] & [Value 1] or it should be converted as a column named [Key] & [Value].
A different approach
When I work with 3rd party databases I usually add a new database, on the same server/instance if possible, and then I use it for my own purposes, just to avoid conflicts with the DB owners.
In this case you could add a new table and periodically throw a process to update it with the new values.
You could use a table with all columns:
CREATE TABLE [FruitKeys]
(
[FruitID] int NOT NULL PRIMARY KEY,
[V1] int NULL,
[V2] int NULL,
[V3] int NULL,
[V4] int NULL,
[V5] int NULL
);
or a table with Key/Value pairs and use a pivot to obtain the final result:
CREATE TABLE [FruitKeys]
(
[FruitID] int NOT NULL,
[Key] int NOT NULL,
[Value] int NOT NULL,
CONSTRAINT [PK_FruitKeys] PRIMARY KEY ([FruitID], [Key])
);
OTHER TIPS
It seems to me that the source data is not that far from JSON
format.
You could convert it pretty directly then use OPENJSON
to produce a relational output:
SELECT
F.FruitID,
F.[Name],
OJ.[key 1],
OJ.[key 2],
OJ.[key 3],
OJ.[key 4],
OJ.[key 5],
F.[Description]
FROM dbo.Fruit AS F
CROSS APPLY OPENJSON
(
-- Convert source data to JSON format
'{' +
CHAR(34) +
REPLACE
(
REPLACE
(
CONVERT(varchar(max), F.Data),
'=', CHAR(34) + ':' + CHAR(34)
),
CHAR(13) + CHAR(10),
CHAR(34) + ',' + CHAR(34)
) +
CHAR(34) +
'}'
)
WITH
(
[key 1] varchar(100),
[key 2] varchar(100),
[key 3] varchar(100),
[key 4] varchar(100),
[key 5] varchar(100)
) AS OJ;
Output:
FruitID | Name | key 1 | key 2 | key 3 | key 4 | key 5 | Description ------: | :----- | :------ | :------ | :------ | :------ | :------ | :---------- 1 | Banana | value 1 | value 2 | value 3 | value 4 | value 5 | Delicious 2 | Pear | value 1 | value 2 | value 3 | value 4 | value 5 | Rotton 3 | Kiwi | value 1 | value 2 | value 3 | value 4 | value 5 | Okay
McNets provided a reasonable approach, but the splitting for every pair, while obviously necessary, is a pretty time consuming process. With 500+ key/value pairs for every record in the table, I'm not sure it's going to work for my purposes. It's probably a decent approach if there are fewer pairs and a small number of rows in the affected table.
Because I'm working with hundreds of key/value pairs and also thousands of records in the table itself, I'm thinking about implementing a user defined function (below) to use as-needed in reports and queries where a specific key/value pair is needed (and known).
CREATE FUNCTION udfsv_GetFruitDataValue(
@FruitID int,
@DataId varchar(100)
)
RETURNS varchar(100)
AS BEGIN
DECLARE @DataVal varchar(100)
set @DataVal = (
select
replace(replace(split1, @DataId + '=', ''), char(13), '') as DataValue
from Fruit
left outer join (
select
FruitID,
value as split1
from Fruit
cross apply string_split(cast([data] as varchar(max)), char(10))
) line1 on line1.FruitID = Fruit.FruitID
where Fruit.FruitID = @FruitID
and split1 like @DataId + '=%'
)
RETURN @DataVal
END
With this I'd be able to perform queries to include specified key/values, just not all key/values.
SELECT
FruitID,
Name,
Description,
udfsv_GetFruitDataValue(FruitID, 'Key 1') as [Key 1],
udfsv_GetFruitDataValue(FruitID, 'Key 4') as [Key 4]
FROM
Fruit
WHERE FruitID = 123