Question

I am working on the database dump of this exact stack exchange section. While I am working on it I have encountered one issue that I am currently unable to solve.

In the XML File Posts.xml the contents look like this

enter image description here

There are of course multiple rows, but that's how one looks like. There's already a Tags.xml file given in the dump, which makes it even more obvious that the "Tags" attribute in that picture is in fact supposed to be its separate table (many to many).

So right now I am trying to figure out a way how to extract the tags. Here's what I tried to do:

CREATE TABLE #TestingIdea (
Id int PRIMARY KEY IDENTITY (1,1),
PostId int NULL,
Tag nvarchar (MAX) NULL
)
GO

↑ The table I created to test out my code. I have already filled it with the Tags and PostIds

SELECT  T1.PostId,
        S.SplitTag
FROM (
    SELECT  T.PostId, 
            cast('<X>'+ REPLACE(T.Tag,'>','</X><X>') + '</X>' as XML) AS NewTag
    FROM #TestingIdea AS T
    ) AS T1
CROSS APPLY (
    SELECT tData.value('.','nvarchar(30)') SplitTag
    FROM T1.NewTag.nodes('X') AS T(tData)
    ) AS S
GO

Yet this code returns this error

XML parsing: line 1, character 37, illegal qualified name character

After googling this error (including here), whatever people had (like extra " marks or different CHAR sets) I didn't have. So I am kind of stuck. Maybe I missed something extremely obvious from previous answers I found T_T In any case I appreciate any help and advice on how to tackle this. It's the last table I have yet to normalize.

Small Sample Data From the XML File https://pastebin.com/AW0Z8Be2 For anyone interested in the program I use to view XML files (so it's much easier to read like in that picture above). It's called FOXE XML Reader (Free XML Editor - First Object)

Was it helpful?

Solution

Does something like this satisfy the resultset?

Table & Data

CREATE TABLE #TestingIdea (
Id int PRIMARY KEY IDENTITY (1,1),
PostId int NULL,
Tag nvarchar (MAX) NULL
)

INSERT INTO #TestingIdea(PostId,Tag)
VALUES(1,'<mysql><innodb><myisam>')

GO

Query

SELECT PostId, RIGHT(value,len(value)-1) as SplitTag
FROM #TestingIdea 
CROSS APPLY string_split(tag,'>')
WHERE value != ''

Result

PostId  SplitTag
1   mysql
1   innodb
1   myisam
Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top