Question

I am working through the first edition of this book and while I enjoy it, some of the examples given seem out-dated. I would give up and find another book to learn from, but I am really interested in what the author is talking about and want to make the examples work for myself, so I am trying to update them as I go along.

The following code is a map/reduce approach to analyzing text that depends on clojure.contrib. I have tried changing the .split function to re-seq with #"\w+", used line-seq instead of read-lines, and changed the .toLowerCase to string/lower-case. I tried to follow my problems to the source code and read the docs thoroughly to learn that the read-lines function closes after you consume the entire sequence and that line-seq returns a lazy sequence of strings, implementing java.io.BufferedReader. The most helpful thing for my problem was post about how to read files after clojure 1.3. Even still, I can't get it to work.

So here's my question: What dependencies and/or functions do I need to change in the following code to make it contemporary, reliable, idiomatic Clojure?

First namespace:

(ns chapter-data.word-count-1
  (:use clojure.contrib.io
        clojure.contrib.seq-utils))

(defn parse-line [line]
  (let [tokens (.split (.toLowerCase line) " ")]
    (map #(vector % 1) tokens)))

(defn combine [mapped]
  (->> (apply concat mapped)
       (group-by first)
       (map (fn [[k v]]
              {k (map second v)}))
       (apply merge-with conj)))

(defn map-reduce [mapper reducer args-seq]
  (->> (map mapper args-seq)
       (combine)
       (reducer)))

(defn sum [[k v]]
  {k (apply + v)})

(defn reduce-parsed-lines [collected-values]
  (apply merge (map sum collected-values)))

(defn word-frequency [filename]
  (map-reduce parse-line reduce-parsed-lines (read-lines filename)))

Second namespace:

(ns chapter-data.average-line-length
  (:use rabbit-x.data-anal
        clojure.contrib.io))

(def IGNORE "_")

(defn parse-line [line]
  (let [tokens (.split (.toLowerCase line) " ")]
    [[IGNORE (count tokens)]]))

(defn average [numbers]
  (/ (apply + numbers)
     (count numbers)))

(defn reducer [combined]
  (average (val (first combined))))

(defn average-line-length [filename]
  (map-reduce parse-line reducer (read-lines filename)))

But when I compile and run it in light table I get a bevy of errors:

1) In the word-count-1 namespace I get this when I try to reload the ns function after editing:

java.lang.IllegalStateException: spit already refers to: #'clojure.contrib.io/spit in namespace: chapter-data.word-count-1

2) In the average-line-length namespace I get similar name collision errors under the same circumstances:

clojure.lang.Compiler$CompilerException: java.lang.IllegalStateException: parse-line already refers to: #'chapter-data.word-count-1/parse-line in namespace: chapter-data.average-line-length, compiling:(/Users/.../average-line-length.clj:7:1)

3) Oddly, when I quit and restart light table, copy and paste the code directly into the files (replacing what's there) and call instances of their top level functions the word-count-1 namespace runs fine, giving me the number of occurrences of certain words in the test.txt file but the average-line-length name-space gives me this:

"Warning: *default-encoding* not declared dynamic and thus is not dynamically rebindable, but its name suggests otherwise. Please either indicate ^:dynamic *default-encoding* or change the name. (clojure/contrib/io.clj:73)...

4) At this point when I call the word-frequency functions of the first namespace it returns nil instead of the number of word occurrences and when I call the average-line-length function of the second namespace it returns

java.lang.NullPointerException: null
            core.clj:1502 clojure.core/val
Was it helpful?

Solution

As far as I can tell, clojure.contrib.io and clojure.contrib.seq-utils are no longer updated, and in fact they may be conflicting with clojure.core functions like spit. I would recommend taking out those dependencies and seeing if you can do this using only core functions. spit should just work -- the error that you're getting is caused by useing clojure.contrib.io, which contains its own spit function, which looks to be roughly equivalent; perhaps the current version in clojure.core is a "new and improved" version of clojure.contrib.io/spit.

Your problem with the parse-line function looks to be caused by the fact that you've defined two functions with the same name, in two different namespaces. The namespaces don't depend on one another, but you can still run into a conflict if you load both namespaces in a REPL. If you only need to use one at a time, try using one of them, and then when you want to use the other one, make sure you do a (remove-ns name-of-first-ns) first to free up the vars so there is no conflict. Alternatively, you could make parse-line a private function in each namespace, by changing (defn parse-line ... to (defn- parse-line ....

EDIT: If you still need any functions that were in clojure.contrib.io or clojure.contrib.seq-utils that aren't available in core or elsewhere, you can always copy the source over into your namespace. See clojure.contrib.io and clojure.contrib.seq-utils on github.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top