Question

I followed the enlive-tutorial and must say i'm impressed by the power of Enlive for parsing the web. Now, I came to look further to the scrape3.clj available here: https://github.com/swannodette/enlive-tutorial/blob/master/src/tutorial/scrape3.clj

Swannodette has made a great job in designing this example, but I feel we could make it a bit dryer.

My question: I would you rewrite this extract function to make it dryer:

(defn extract [node]
  (let [headline (first (html/select [node] *headline-selector*))
        byline   (first (html/select [node] *byline-selector*))
        summary  (first (html/select [node] *summary-selector*))
        result   (map html/text [headline byline summary])]
    (zipmap [:headline :byline :summary] (map #(re-gsub #"\n" "" %) result)))) 

If you have other ideas on other elements of the program, feel free to share them!

EDIT: I played around and came up with:

    (defn extract [node]
      (let [s [*headline-selector* *byline-selector* *summary-selector*] 
            selected (map #(html/text (first (html/select [node] %))) s)
            cleaned  (map #(re-gsub #"\n" "" %) selected)]
        (zipmap [:headline :byline :summary] cleaned)))
Was it helpful?

Solution 2

To make the result of the function "more visible" I would use map literal as shown below:

(defn extract [node]
  (let [sel #(html/text (first (html/select [node] %)))
        rem #(re-gsub #"\n" "" %)
        get-text #(-> % sel rem)]
    {:headline (get-text *headline-selector*)
     :byline (get-text *byline-selector*)
     :summary (get-text *summary-selector*)
     }))

OTHER TIPS

the first (html/select [node] can be hoisted to a local function:

(defn extract [node]
  (let [selector (fn [sel]) (html/select [node] sel)
        headline (selector *headline-selector*)
        byline   (selector *byline-selector*)
        summary  (selector *summary-selector*)
        result   (map html/text [headline byline summary])]
    (zipmap [:headline :byline :summary] (map #(re-gsub #"\n" "" %) result))))

then the intermediary names can be removed, though these help make the point of the code clear so it's a matter of personal taste:

(defn extract [node]
  (let [selector (fn [selector]) (html/select [node] selector)
        result   (map html/text 
                   (map selector [*headline-selector* 
                                  *byline-selector* 
                                  *summary-selector*]))]
    (zipmap [:headline :byline :summary] (map #(re-gsub #"\n" "" %) result)))) 
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top