Counting ocurrences of certain words case-insensitive in an XML element

https://stackoverflow.com/questions/11823033

24-06-2021
|

質問

Following is the structure of the XML file -

<Datas>
  <Data>
    <Name>Information</Name>
    <Desc>Today is Monday, the starting day of the week.</Desc>
  </Data>
  <Data>
    <Name>Stackoverflow.com</Name>
    <Desc>Yesterday 1200 questions were posted. <b>TODAY</b>, till now 1300 questions are posted. So, today will be an important day for all the senior members.</Desc>
  </Data>
</Datas>

In the above XML, I want to count the occurrences of word today. This word can be in any format like - Today, today, TODAY or toDay. The last one is not correct but in case if user types like this, it shouldnt be missed.

I am using query -

count(/Datas/Data[contains(translate(Desc,'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXY'), 'TODAY')])

which is results 2,but there are in all 3 !! How to include all?

解決

If you're using BaseX (which you are as I remember), you can use the non-standard ft:count which makes life easier a lot.

ft:count(//*[text() contains text "today"])

Additional benefit is that this query will be able to use the full text index, which will be much faster than tokenizing the document for each query. Remember setting a full text index without capitalization.

他のヒント

This one counts 3:

count(/Datas/Data//text()/tokenize(upper-case(.), "[\P{L}]")[. = "TODAY"])

It uses fn:upper-case for case normalization, and fn:tokenize to isolate the words. Note that words here are required to be separated by non-letters, which behaves different than the original query using fn:contains. That might be what you want, though.

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow