Question

Here is an example where I use html/text directly inside a selector vector.

(:use [net.cgrand.enlive-html :as html])

(defn fetch-url [url]
  (html/html-resource (java.net.URL. url)))

(defn parse-test []
  (html/select 
   (fetch-url "https://news.ycombinator.com/") 
   [:td.title :a html/text]))

Calling (parse-test) returns a data structure containing Hacker News Headlines :

("In emergency cases a passenger was selected and thrown out of the plane. [2004]" 
 "“Nobody expects privacy online”: Wrong." 
 "The SCUMM Diary: Stories behind one of the greatest game engines ever made" ...)

Cool!

Would it be possible to end the selector vector with a custom function that would give me back the list of article URLs.

Something like: [:td.title :a #(str "https://news.ycombinator.com/" (:href (:attrs %)))]

EDIT:

Here is a way to achieve this. We could write our own select function:

(defn select+ [coll selector+]
   (map
     (peek selector+)
     (html/select 
       (fetch-url "https://news.ycombinator.com/") 
       (pop selector+))))

(def href
  (fn [node] (:href (:attrs node))))

(defn parse-test []
  (select+ 
   (fetch-url "https://news.ycombinator.com/") 
   [:td.title :a href]))

(parse-test)
Was it helpful?

Solution

As you suggest in your comment, I think it's clearest to keep the selection and the transformation of nodes separate.

Enlive itself provides both selectors and transformers. Selectors to find nodes, and transformers to, um, transform them. If your intended output was html, you could probably use a combination of a selector and a transformer to achieve your desired result.

However, seeing as you are just looking for data (a sequence of maps, perhaps?) - you can skip the transform bit, and just use a sequence comprehension, like this:

(defn parse-test []
  (for [s (html/select 
            (fetch-url "https://news.ycombinator.com/") 
              [:td.title :a])]
    {:title (first (:content s))
     :link  (:href (:attrs s))}))

(take 2 (parse-test))
;; => ({:title " \tStartup - Bill Watterson, a cartoonist's advice ",
        :link "http://www.zenpencils.com/comic/128-bill-watterson-a-cartoonists-advice"} 
       {:title "Drug Agents Use Vast Phone Trove Eclipsing N.S.A.’s",
        :link "http://www.nytimes.com/2013/09/02/us/drug-agents-use-vast-phone-trove-eclipsing-nsas.html?hp&_r=0&pagewanted=all"})
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top