Question

I am implementing an ordered set in clojure, where I retrieve elements based on their rank. This means that I can retrieve the 4th element (according to the set's ordering), the 3rd, or the 7th, all in logarithmic time.

In order to get my new data structure integrated with clojure's common methods (or "abstractions") such as conj, get, nth, etc., Which is the better way to do it:

  1. Actually implement conj, for example, in my datatype's protocol, or
  2. Implement Rich Hickey's clojure.lang.IPersistentSet or some interface like it.

The first seems easier, but also easier to mess up the semantics of the function. The second seems like I am implementing an interface that was never meant to be part of the public API, and the actual methods that are associated with that interface (protocol) are confusingly different. For example, it seems that in order to implement conj with my set, I must implement a cons method of clojure.lang.IPersistentSet, which has a different name. There seems to have little documentation on how this all works, which poses a large challenge in implementing this ranked set.

Which one should I choose? Should I implement my own or the methods of a clojure.lang interface? If I should do the latter, where is some good documentation that can guide me through the prosses?

EDIT: I want to make it clear that I am trying to make a set from which you can retrieve any element (or "remove" it) in logarithmic time by specifying the element's rank (e.g., "give me the 5th element, mr. set."). To my knowledge, no such set yet exists in clojure.

Was it helpful?

Solution

Firstly, I have just released a library called avl.clj which implements persistent sorted maps and sets with support for the standard Clojure API (they are drop-in replacements for the built-in sorted collections), as well as transients and logarithmic time rank queries (via clojure.core/nth)1. Both Clojure and ClojureScript are supported; performance on the Clojure side is mostly on a par with the built-in variants in my preliminary benchmarking. Follow the link above if you'd like to give it a try. Any experience reports would be greatly appreciated!

As for the actual question: I'm afraid there isn't much in the way of documentation on Clojure's internal interfaces, but still, implementing them is the only way of making one's custom data structures fit in with the built-ins. core.rrb-vector (which I have written and now maintain) takes this approach, as do other Contrib libraries implementing various data structures. This is also what I've done with avl.clj, as well as sorted.clj (which is basically the ClojureScript port of the red-black-tree-based sorted collections backported to Clojure). All of these libraries, as well as Clojure's own gvec.clj file which implements the primitive-storing vectors produced by clojure.core/vector-of, can serve as examples of what's involved. (Though I have to say it's easy to miss a method here and there...)

The situation is much simpler in ClojureScript, where all the core protocols are defined at the top of core.cljs, so you can just look at the list and pick the ones relevant to your data structure. Hopefully the same will be true on the Clojure side one day.


1 Removal by rank is (disj my-set (nth my-set 123)) for now. I might provide a direct implementation later on if it turns out to make enough of a difference performance-wise. (I'll definitely write one to check if it does.)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top