I am working through the first edition of this book and while I enjoy it, some of the examples given seem out-dated. I would give up and find another book to learn from, but I am really interested in what the author is talking about and want to make the examples work for myself, so I am trying to update them as I go along.
The following code is a map/reduce approach to analyzing text that depends on clojure.contrib. I have tried changing the .split function to re-seq with #"\w+", used line-seq instead of read-lines, and changed the .toLowerCase to string/lower-case. I tried to follow my problems to the source code and read the docs thoroughly to learn that the read-lines function closes after you consume the entire sequence and that line-seq returns a lazy sequence of strings, implementing java.io.BufferedReader. The most helpful thing for my problem was post about how to read files after clojure 1.3. Even still, I can't get it to work.
So here's my question: What dependencies and/or functions do I need to change in the following code to make it contemporary, reliable, idiomatic Clojure?
First namespace:
(ns chapter-data.word-count-1
(:use clojure.contrib.io
clojure.contrib.seq-utils))
(defn parse-line [line]
(let [tokens (.split (.toLowerCase line) " ")]
(map #(vector % 1) tokens)))
(defn combine [mapped]
(->> (apply concat mapped)
(group-by first)
(map (fn [[k v]]
{k (map second v)}))
(apply merge-with conj)))
(defn map-reduce [mapper reducer args-seq]
(->> (map mapper args-seq)
(combine)
(reducer)))
(defn sum [[k v]]
{k (apply + v)})
(defn reduce-parsed-lines [collected-values]
(apply merge (map sum collected-values)))
(defn word-frequency [filename]
(map-reduce parse-line reduce-parsed-lines (read-lines filename)))
Second namespace:
(ns chapter-data.average-line-length
(:use rabbit-x.data-anal
clojure.contrib.io))
(def IGNORE "_")
(defn parse-line [line]
(let [tokens (.split (.toLowerCase line) " ")]
[[IGNORE (count tokens)]]))
(defn average [numbers]
(/ (apply + numbers)
(count numbers)))
(defn reducer [combined]
(average (val (first combined))))
(defn average-line-length [filename]
(map-reduce parse-line reducer (read-lines filename)))
But when I compile and run it in light table I get a bevy of errors:
1) In the word-count-1 namespace I get this when I try to reload the ns function after editing:
java.lang.IllegalStateException: spit already refers to: #'clojure.contrib.io/spit in namespace: chapter-data.word-count-1
2) In the average-line-length namespace I get similar name collision errors under the same circumstances:
clojure.lang.Compiler$CompilerException: java.lang.IllegalStateException: parse-line already refers to: #'chapter-data.word-count-1/parse-line in namespace: chapter-data.average-line-length, compiling:(/Users/.../average-line-length.clj:7:1)
3) Oddly, when I quit and restart light table, copy and paste the code directly into the files (replacing what's there) and call instances of their top level functions the word-count-1 namespace runs fine, giving me the number of occurrences of certain words in the test.txt file but the average-line-length name-space gives me this:
"Warning: *default-encoding* not declared dynamic and thus is not dynamically rebindable, but its name suggests otherwise. Please either indicate ^:dynamic *default-encoding* or change the name. (clojure/contrib/io.clj:73)...
4) At this point when I call the word-frequency
functions of the first namespace it returns nil
instead of the number of word occurrences and when I call the average-line-length
function of the second namespace it returns
java.lang.NullPointerException: null
core.clj:1502 clojure.core/val