XQuery - Why there is difference in result?

https://stackoverflow.com/questions/19043615

xquery
basex

29-06-2022
|

سؤال

<Docs>
 <Doc>
   <Title>Electromagnetic Fields</Title>
    <Info>
      <Vol name="Physics"/>
      <Year>2006</Year>
    </Info>
    <SD>
      <Info>
        <Para>blah blah blah.<P>blh blah blah.</P></Para>
      </Info>
    </SD>
    <LD>
      <Info>
        <Para>blah blah blah.<P>blah blah blah.</P></Para>
        <Para>blah blah blah.<P>blah blah blah.</P></Para>
        <Para>blah blah blah.<P>emf waves blah.</P></Para>
        <Para>blah blah blah.<B>emf waves</B> blah.</Para>
        <Para>blah blah blah.<P>emf waves blah.</P></Para>
        <Para>blah waves blah.<B>emf</B> waves blah.</Para>
        <Para>emf blah blah.<I>waves blah.</I></Para>
        <Para>blah blah blah.<B>emf waves</B> blah.</Para>
        <Para>blah blah blah.<P><I>emf</I> waves blah.</P></Para>
      </Info>
    </LD>
</Doc>      
</Docs>

Query 1 -

for $x in ft:search("Article", ("emf","waves"), map{'mode':='all words'})/ancestor::*:Doc
  return $x/Title

I am getting 62 Hits

Query 2 -

for $x in ft:search("Article", ("emf","waves"), map{'mode':='all words'})
  return $x/ancestor::*:Doc/Title

I am getting 159 Hits

Query 3 -

for $x in doc("Article")/Doc[Info[Vol/@name="Physics" and Year ge "2006" and Year le "2010"]]
[SD/Info/Para/text() contains text {"emf","waves"} all words or
SD/Info/Para/P/text() contains text {"emf","waves"} all words or
LD/Info/Para/text() contains text {"emf","waves"} all words or
SD/Info/Para/P/text() contains text {"emf","waves"} all words or
SD/Info/Para/P/B/text() contains text {"emf","waves"} all words or
SD/Info/Para/P/I/text() contains text {"emf","waves"} all words or
SD/Info/Para/P/U/text() contains text {"emf","waves"} all words]
    return $x/Title

This results in 224 hits. In the 3rd one, I am using all the nodes which are actually present. I, B and U are for Italic, Bold and Underline the text.

Why this difference ?

المحلول

Queries 1 and 2 pretty much look the same, however the path expression in Q1 results in Doc elements. So if there are multiple matching nodes below a single Doc, that Doc will count just once in Q1, whereas each node is counted individually in Q2. This is due to the fact that the node sequence resulting from a path expression, by definition, is duplicate-free.

Q3 is different, but while Q1 and Q2 depend on the properties of a full-text index, Q3 doesn't. If e.g. the index is case-sensitive, you'll get less results from it than from a contains text predicate.

So from the quoted counts, I'd assume that the text index comes up with 159 matching nodes in 62 documents, while being specified as more restrictive than a plain contains text.

نصائح أخرى

Your first query searches for Doc elements which have a certain property, and returns one result for each such Doc element.

Your second query searches for nodes of any kind which have a (related) property, and returns one result for each such node.

Your third query searches for text nodes which have another (related) property.

Whenever there are Doc elements containing more than one node matching the full-text search criterion, the first and second queries will return different numbers of hits. And similarly for the third query, vis-a-vis the others.

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow