Counting ocurrences of certain words case-insensitive in an XML element
質問
Following is the structure of the XML file -
<Datas>
<Data>
<Name>Information</Name>
<Desc>Today is Monday, the starting day of the week.</Desc>
</Data>
<Data>
<Name>Stackoverflow.com</Name>
<Desc>Yesterday 1200 questions were posted. <b>TODAY</b>, till now 1300 questions are posted. So, today will be an important day for all the senior members.</Desc>
</Data>
</Datas>
In the above XML, I want to count the occurrences of word today
. This word can be in any format like - Today
, today
, TODAY
or toDay
. The last one is not correct but in case if user types like this, it shouldnt be missed.
I am using query -
count(/Datas/Data[contains(translate(Desc,'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXY'), 'TODAY')])
which is results 2,but there are in all 3 !! How to include all?
解決
If you're using BaseX (which you are as I remember), you can use the non-standard ft:count
which makes life easier a lot.
ft:count(//*[text() contains text "today"])
Additional benefit is that this query will be able to use the full text index, which will be much faster than tokenizing the document for each query. Remember setting a full text index without capitalization.
他のヒント
This one counts 3:
count(/Datas/Data//text()/tokenize(upper-case(.), "[\P{L}]")[. = "TODAY"])
It uses fn:upper-case for case normalization, and fn:tokenize to isolate the words. Note that words here are required to be separated by non-letters, which behaves different than the original query using fn:contains. That might be what you want, though.