XQueryフルテキスト混合コンテンツを検索します

https://stackoverflow.com//questions/22062779

23-12-2019
|

質問

フォローはXML構造です（私はデータが限られているドキュメント全体の非常に小さな部分を与えました。私は適切な全文索引を持つ6 GBのXML DBを持っています。）

<Docs>
 <Doc>
<Chap>No - 1</Chap>
<Desc>
  <Notes>
    <Para t="sn">departmental report</Para>
  </Notes>
  <Notes>
    <Para t="sn">The equiry commission is good.</Para>
  </Notes>
  <Notes>
    <Para t="sn">departmental process</Para>
    <Para t="ln">The enquiry report for the bomb blast is yet to come.<bL/>
      <bL/>The department working on this is quite lazy.</Para>
  </Notes>
</Desc>
</Doc>
<Doc>
<Chap>No - 2</Chap>
<Desc>
  <Notes>
    <Para t="sn">Enquiry Processes Report</Para>
    <Para t="ln">The enquiry process is very simple.<bL/>
      <bL/>With proper guidance anybody can handle the commission easily.<bL/>
      <bL/>
    </Para>
  </Notes>
  <Notes>
    <Para t="sn">Enquiry - Departmental</Para>
  </Notes>
</Desc>
 </Doc>
 <Doc>
<Chap>No - 3</Chap>
<Desc>
  <Notes>
    <Para t="sn">Physics Department</Para>
  </Notes>
  <Notes>
    <Para t="sn">Working process of physics department is quite lengthy</Para>
    <Para t="ln">Even after proper enquiry, I was told nothing.<bL/>
      <bL/>This was like a bomb blast.</Para>
  </Notes>
  <Notes>
    <Para t="sn">Departmental enquiry.</Para>
    <Para t="ln">There should be a departmental enquiry for this wrong process.</Para>
  </Notes>
</Desc>
</Doc>
</Docs>

今、すべての単語「部門」、「照会」、「レポート」を含むすべてのChapノードが必要です。

これまでのところ、さまざまな組み合わせを使って入手できません。私の試しの一つは -

for $x in ft:search("Docs", ("departmental enquiry report"), map{'mode':='all words'})/ancestor::*:Para
 return $x/ancestor::Chap

誰もがこれを案内することができますか？

解決

BaseXのフルテキストインデックスは、テキストノードレベルのすべての条項を参照します。つまり、すべての単語が同じテキストノードで発生する必要があることを意味します。

フルテキストクエリを利用して特定の要素の下に発生するすべての単語を検索する場合は、次のクエリを試すことができます。

let $words := ("departmental enquiry report")
for $doc in db:open("Docs")//Doc[.//text() contains text { $words } any word]
where $doc[string-join(.//text(), ' ') contains text { $words } all words]
return $doc/Chap

最初のcontains text式はインデックス要求に書き換えられます。検索された単語を返すすべてのテキストを返します。where句のContains Text式は、すべてのクエリ用語を含まないすべてのノードを除外します。string-join(.//text(), ' ')では、DOC要素の下のすべてのテキストノードが連結され、検索は結合された文字列に対して実行されます。

次のクエリの同等の表現は同じ結果をもたらすはずです：

let $words := ("departmental enquiry report")
for $x in ft:search("Docs", $words, map { 'mode': 'any word' })/ancestor::*:Doc
where ft:contains(string-join($x//text(), ' '), $words, map { 'mode': 'all words' })
return $x/Chap

他のヒント

`ft:search`、そしてそれが問題を解決しない理由

basex 'XQuery全文文書 2番目の引数を実現することに気付くでしょうft:searchでは、一連の単語である必要があります。

ft:search($db as xs:string, $terms as item()*, $options as item()) as text()*

だから、あなたのクエリは

のようなものを見るべきです

for $x in ft:search("Docs", ("departmental", "enquiry", "report"), map{'mode':='all words'})/ancestor::*:Para
return $x/ancestor::Chap

まだこれはあなたの問題を解決しないでしょう、この関数

[RE]指定された$dbを含むデータベース$termsのフルテキストインデックスからすべてのテキストノードを回します。

つまり、これらの単語はすべて単一のテキストノードで発生する必要がありますが、サンプル入力（<Doc/>ノード上のすべての）に複数の上に広がります。

標準XQuery全文を使用する

私はあなたが実際にこれら3つの単語をすべて含む<Doc/>ノードを検索したいという入力と単語から推測しなければなりませんでした。

for $document in doc("Docs")/Docs/Doc
where $document contains text { 'departmental', 'enquiry', 'report' } all words
return $document/Chap

これはすべての文書を取得し、それに全文検索を適用し、ついにドキュメントの章ノードを返します。

を守る

サンプル文書と
全文索引を作成する（まだ行わなかった場合）パフォーマンスを高めることになります。

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow

XQueryフルテキスト混合コンテンツを検索します

ft:search、そしてそれが問題を解決しない理由

標準XQuery全文を使用する

`ft:search`、そしてそれが問題を解決しない理由