Use QueryPath to get the contents of arbitrary HTML elements

https://stackoverflow.com/questions/5414269

29-10-2019
|

سؤال

I'm using the PHP QueryPath library to extract data from a collection of old HTML files, and for the most part have been using the CSS selectors available through the find() function to extract data. However, not all of the elements containing data I need to extract have a unique CSS identifier, so I've been using an ugly combination of Regexp and QueryPath to extract the data.

<ul class="list><li>Data1</li><li>Data2</li></ul>

How would I, for example, cleanly extract "Data2" from this list element? Is there a QueryPath function that will let me specify, for example, the second child of a parent element as the element to retrieve?

المحلول

To get the nth matched object you can use QueryPath::get(n-1).

نصائح أخرى

There are actually several ways to do this. The easiest is to use the CSS 3 pseduclass :nth-of-type(). This gets the second LI directly inside of the UL:

qp($html, 'ul>li:nth-of-type(2)');

:nth-of-type and other CSS 3 selectors take what are called "an+b" rules, where you can say how many items make up a group, and then say which item from the group you want. For example, tr:nth-of-type(4n+2) will break up table rows into groups of 4, and then return the second element in each group. :even and :odd are just shorthand for 2n and 2n+1.

Other CSS that might be worth looking into:

':nth'
':first-of-type', ':first'
':last-of-type', ':last'
':even', ':odd'
':not()', ':has()', and ':contains()'

You can also get all of the LI elements, and then get just the second one:

qp($html, 'li')->eq(2);

Or, as a previous poster pointed out, you can get the actual DOMNode object for the second one using get():

qp($html, 'li')->get(2);

If you have really sophisticated needs, you can use filter() to take a list, and run it through a custom function.

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow