xquery return Boolean if node exists at a certain location or not

https://stackoverflow.com/questions/13036830

13-07-2021
|

Domanda

I have XML similar to the following in a SQL 2008 database, stored in an XML field. I would like to return a true or false indication if a node exists in a specific section of the XML.

<root>
    <node attribute1='value1' attribute2='value2'>
        <sub1 name='ID' value="1" />
        <sub2 name='project' value="abc" />
        <sub3 name='Lead' value="John" />
    </node>
    <entry attribute1='value1' attribute2='value2'>
        <message>start</message>
    </entry>
    <entry attribute1='value1' attribute2='value2'>
        <attribute name='project' value='done'>
    </entry>    
    <node attribute1='value1'>
        <sub1 name='ID' value="2" />
        <sub2 name='project' value="abc" />
        <sub3 name='Lead' value="John" />
    </node>
    <entry attribute1='value1' attribute2='value2'>
        <message>start</message>
    </entry>
    <node attribute1='value1'>
        <sub1 name='ID' value="3" />
        <sub2 name='project' value="abc" />
        <sub3 name='Lead' value="John" />
    </node>
    <entry attribute1='value1' attribute2='value2'>
        <message>start</message>
    </entry>
    <node attribute1='value1'>
        <sub1 name='ID' value="4" />
        <sub2 name='project' value="abc" />
        <sub3 name='Lead' value="John" />
    </node> 
    <entry attribute1='value1' attribute2='value2'>
        <message>start</message>
    </entry>
    <entry attribute1='value1' attribute2='value2'>
        <attribute name='project' value='done'>
    </entry>
</root>

As you'll notice, the <attribute> node may or may not occur after a node with 'ID'. In this example, you can see it in the first and fourth "sections" for lack of a better term.

With the following table structure:

ID (PK)
EventID (FK)
RawXML (XML)
Created (datetime)

Here is an extract of the SQL/xQuery that I have so far:

WITH XMLNAMESPACES(
  'http://www.w3.org/2001/XMLSchema-instance' as xsi,
),
t1 as(
    SELECT distinct
      x.EventId
    , c.value ('(//node/sub[@name=''ID'']/@value)[1]', 'nvarchar(max)') as ID   
    , c.value ('(//node/sub[@name=''ID''][1][descendant::attribute/@name=''project''])[1]', 'nvarchar(max)' ) as Exists     
    FROM
      Table1 x
    CROSS APPLY
      RawXML.nodes('./.') as t(c)
     )
select distinct
  t1.ID
, t1.Exists
from t1

I will be running the script 4 or more times (incrementing all of the singleton values before each run)

For the XML given, I need to end up with the following results after running the query 4 times: (the values of the IDs will not be know so I can't use them in the query)

    ID    Exists
   ----   -------
    1      true
    2      false
    3      false
    4      true

With the SQL given, I didn't get any errors but it's taking forever (well over 45 minutes) and I haven't even let it finish yet. It really shouldn't take this long to parse the XML.

UPDATE: I limited my query to make sure it was only parsing one row (one XML file) and it finished in 57 seconds. However, I got a result of '0' for ID 1 and ID 2 when I should have had a '1' for ID 1.

And I'm sure most of you are aware that following-sibling, etc isn't supported by SQL Server so unfortunately that's not an option.

Just for reference, I've used this successfully to find the two instances of 'Project' but it ignores where in the xml they occur.:

c.value ('(//node[descendant::attribute/@name=''Project''])[1]', 'nvarchar(max)' ) as TrueFalse

So basically, I need to know if the node with name='Project' exists after a node with name='ID' BUT before the next instance of a node with name='ID'

Soluzione

You have some errors in your XML and judging by the query you use I also changed the sub nodes.

You can enumerate your ID and project nodes using row_number() and then check if the "next row" is a project node or a an ID row using regular SQL instead of XQuery.

-- Temp table to hold the extracted values from the XML
create table #C
(
  rn int primary key,
  ID int
);

-- Get the enumerated rows with ID.
-- project nodes will have NULL in ID
insert into #C
  select row_number() over(order by T.N) as rn,
         T.N.value('sub[@name = "ID"][1]/@value', 'int') as ID
  from table1
    cross apply RawXML.nodes('/root/*[sub/@name = "ID" or attribute/@name = "project"]') as T(N)

-- Get the ID's and check if the next row is a project node
select C1.ID,
       case when exists (
                          select *
                          from #C as C2
                          where C1.rn + 1 = C2.rn and 
                          C2.ID is null
                        ) 
         then 1
         else 0
       end as [Exists]
from #C as C1
where C1.ID is not null;

drop table #C;

SQL Fiddle

You can do it without a temp table using a CTE instead but I suspect that the temp table version will be faster.

with C as
(
  select row_number() over(order by T.N) as rn,
          T.N.value('sub[@name = "ID"][1]/@value', 'int') as ID
  from table1
    cross apply RawXML.nodes('/root/*[sub/@name = "ID" or attribute/@name = "project"]') as T(N)
)
select C1.ID,
        case when exists (
                          select * 
                          from C as C2 
                          where C1.rn + 1 = C2.rn and 
                                C2.ID is null
                        ) 
          then 1
          else 0
        end as [Exists]
from C as C1
where C1.ID is not null;

SQL Fiddle

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow