Question

What's the best way to get a sequence of columns (as vectors or whatever) from an Incanter data set?

I thought of:

(to-vect (trans (to-matrix my-dataset)))

But Ideally, I'd like a lazy sequence. Is there a better way?

Was it helpful?

Solution

Use the $ macro.

=> (def data (to-dataset [{:a 1 :b 2} {:a 3 :b 4}]))
=> ($ :a data)  ;; :a column
=> ($ 0 :all data) ;; first row

=> (type ($ :a data))
clojure.lang.LazySeq

OTHER TIPS

Looking at the source code for to-vect it makes use of map to build up the result, which is already providing one degree of lazyness. Unfortunately, it looks like the whole data set is first converted toArray, probably just giving away all the benefits of map lazyness.

If you want more, you probably have to dive into the gory details of the Java object effectively holding the matrix version of the data set and write your own version of to-vect.

You could use the internal structure of the dataset.

user=> (use 'incanter.core)
nil
user=> (def d (to-dataset [{:a 1 :b 2} {:a 3 :b 4}]))
#'user/d
user=> (:column-names d)
[:a :b]
user=> (:rows d)
[{:a 1, :b 2} {:a 3, :b 4}]
user=> (defn columns-of
         [dataset]
         (for [column (:column-names dataset)]
           (map #(get % column) (:rows dataset))))
#'user/columns-of
user=> (columns-of d)
((1 3) (2 4))

Although I'm not sure in how far the internal structure is public API. You should probably check that with the incanter guys.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top