Question

I want to search keywords in this xml file. The freshvideo.xml contains "video" tags. I want to do this: e.g., if I search "gear slow", or "new England gear", the search returns the "id" of this "video" element.

Below is a sample of my xml file.

<freshvideos>
    <video>
        <id>
            <![CDATA[ 4f1a6a21e779d227eaff33de8f571f95 ]]>
        </id>
        <title>
            <![CDATA[ New England Snowstorm - \"Low Gear\" ]]>
        </title>
        <ensub>
            <![CDATA[ I put it in low gear and take it slow. ]]>
        </ensub>
        <cnsub>
            <![CDATA[ 我挂了抵挡,慢慢开。 ]]>
        </cnsub>

        <filesrc>
            <![CDATA[ videos/New England Snowstorm Low Gear.mp4 ]]>
        </filesrc>
    </video>
</freshvideos>

I first change all the keywords into lower case, and I also change all xml elements into lower case, to make the search case insensitive.

Currently I'm doing this:

$dom = new DOMDocument;
$dom->load("freshvideos.xml");
$xml = $dom->saveXML($dom);
$xml = strtolower($xml);
$lowerCaseDom = new DOMDocument;
$lowerCaseDom->loadXML($xml);

Problem is: Warning: DOMDocument::loadXML() [domdocument.loadxml]: StartTag: invalid element name in Entity Warning: DOMDocument::loadXML() [domdocument.loadxml]: Sequence ']]>' not allowed in content in Entity

I also thought of using this delimiter:

$xml = strtolower($xml);
$xml2 =<<<XML
echo strtolower($xml);
XML;
$lowerCaseDom->loadXML($xml2);

turned out that the string has quotation marks at the beginning below the "<<

So, how can I get this lowercase search.

Thanks in advance!

Was it helpful?

Solution

When you run your document through strtolower, this is what ends up happening (remember, you're still passing around a string at this point, not a DOMDocument:

<freshvideos>
    <video>
        <id>
            <![cdata[ 4f1a6a21e779d227eaff33de8f571f95 ]]>
        </id>
        <title>
            <![cdata[ new england snowstorm - \"low gear\" ]]>
        </title>
        <ensub>
            <![cdata[ i put it in low gear and take it slow. ]]>
        </ensub>
        <cnsub>
            <![cdata[ 我挂了抵挡,慢慢开。 ]]>
        </cnsub>

        <filesrc>
            <![cdata[ videos/new england snowstorm low gear.mp4 ]]>
        </filesrc>
    </video>
</freshvideos>

Your opening CDATA tag is no longer valid once it's been lower cased like that, so you're going to get weird results when trying to process your document as xml. A CDATA section must be delimited by a string matching <![CDATA[ ]]> and nothing else.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top