Pergunta

Here are some values. Each is a sequence of ascending (or otherwise grouped) values.

(def input-vals [[[1 :a] [1 :b] [2 :c] [3 :d] [3 :e]]
           [[1 :f] [2 :g] [2 :h] [2 :i] [3 :j] [3 :k]]
           [[1 :l] [3 :m]]])

I can partition them each by value.

=> (map (partial partition-by first) input-vals)
   ((([1 :a] [1 :b]) ([2 :c]) ([3 :d] [3 :e])) (([1 :f]) ([2 :g] [2 :h] [2 :i]) ([3 :j] [3 :k])) (([1 :l]) ([3 :m])))

But that gets me 3 sequences of partitions. I want one single sequence of partitioned groups.

What I want to do is return a single lazy sequence of (potentially) lazy sequences that are the respective partitions joined. e.g. I want to produce this:

((([1 :a] [1 :b] [1 :f] [1 :l]) ([2 :c] [2 :g] [2 :h] [2 :i]) ([3 :d] [3 :e] [3 :j] [3 :k] [3 :m])))

Note that not all values appear in all sequences (there is no 2 in the third vector).

This is of course a simplification of my problem. The real data is a set of lazy streams coming from very large files, so nothing can be realised. But I think the solution for the above question is the solution for my problem.

Feel free to edit the title, I wasn't quite sure how to express it.

Foi útil?

Solução

Try this horror:

(defn partition-many-by [f comp-f s]
  (let [sorted-s (sort-by first comp-f s)
        first-list (first (drop-while (complement seq) sorted-s))
        match-val (f (first first-list))
        remains (filter #(not (empty? %)) 
                        (map #(drop-while (fn [ss] (= match-val (f ss))) %) 
                             sorted-s))]
    (when match-val
      (cons
        (apply concat
          (map #(take-while (fn [ss] (= match-val (f ss))) %)
               sorted-s))
        (lazy-seq (partition-many-by f comp-f remains))))))

It could possibly be improved to remove the double value check (take-while and drop-while).

Example usage:

(partition-many-by identity [[1 1 1 1 2 2 3 3 3 3] [1 1 2 2 2 2 3] [3]])

=> ((1 1 1 1 1 1) (2 2 2 2 2 2) (3 3 3 3 3 3))

Outras dicas

Let's make this interesting and use sequences of infinite length for our input

(def twos (iterate #(+ 2 %) 0))
(def threes (iterate #(+ 3 %) 0))
(def fives (iterate #(+ 5 %) 0))

We'll need to lazily merge them. Let's ask for a comparator so we can apply to other data types as well.

(defn lazy-merge-by
 ([compfn xs ys] 
  (lazy-seq
    (cond
      (empty? xs) ys
      (empty? ys) xs
      :else (if (compfn (first xs) (first ys)) 
              (cons (first xs) (lazy-merge-by compfn (rest xs) ys))
              (cons (first ys) (lazy-merge-by compfn xs (rest ys)))))))
  ([compfn xs ys & more] 
   (apply lazy-merge-by compfn (lazy-merge-by compfn xs ys) more)))

Test

(take 15 (lazy-merge-by < twos threes fives))
;=> (0 0 0 2 3 4 5 6 6 8 9 10 10 12 12)

We can (lazily) partition by value if desired

(take 10 (partition-by identity (lazy-merge-by < twos threes fives)))
;=> ((0 0 0) (2) (3) (4) (5) (6 6) (8) (9) (10 10) (12 12))

Now, back to the sample input

(partition-by first (apply lazy-merge-by #(<= (first %) (first %2)) input-vals))
;=> (([1 :a] [1 :b] [1 :f] [1 :l]) ([2 :c] [2 :g] [2 :h] [2 :i]) ([3 :d] [3 :e] [3 :j] [3 :k] [3 :m]))

as desired less one extraneous set of outer parentheses.

I'm not sure whether I'm following but you can faltten the result sequence, something like:

(flatten (partition-by identity (first input-vals)))

clojure.core/flatten
([x])
Takes any nested combination of sequential things (lists, vectors,
etc.) and returns their contents as a single, flat sequence.
(flatten nil) returns an empty sequence.

You can use realized? function to test whether a sequence is lazy or not.

user> (def desired-result '((([1 :a] [1 :b] [1 :f] [1 :l])
                             ([2 :c] [2 :g] [2 :h] [2 :i])
                             ([3 :d] [3 :e] [3 :j] [3 :k] [3 :m]))))
#'user/desired-result

user> (def input-vals [[[1 :a] [1 :b] [2 :c] [3 :d] [3 :e]]
                       [[1 :f] [2 :g] [2 :h] [2 :i] [3 :j] [3 :k]]
                       [[1 :l] [3 :m]]])
#'user/input-vals

user> (= desired-result (vector (vals (group-by first (apply concat input-vals)))))
true

I changed the input-vals slightly to correct for what I assume was a typographical error, if it was not an error I can update my code to accommodate the less regular structure.

Using the ->> (thread last) macro, we can have the equivalent code in a more readable form:

user> (= desired-result
         (->> input-vals
           (apply concat)
           (group-by first)
           vals
           vector))
true
(partition-by first (sort-by first (mapcat identity input-vals)))
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top