Following this page scraping tutorial the author gets a collection of all images on the page as follows:

css :: ArrowXml a => String -> a XmlTree XmlTree
css tag = multi (hasName tag)

images tree = tree >>> css "img" >>> getAttrValue "src"

How can I only get, say, the 2nd image on the page? I couldn't find any sort of function like getElementAt :: Int -> blah in the XmlArrow docs.

Thanks!

有帮助吗?

解决方案

Functions for manipulating lists of elements can be found in the ArrowList type-class.

In this particular case, you can use the >>. operator to transform the result list using ordinary list functions.

nthImage n tree = images tree >>. (take 1 . drop n)
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top