Question

Suppose I have one million article entities in my backend with an inst attribute called date, or one million player entities with an int attribute called points. What's a good way to select the 10 latest articles or top-scoring players?

Do I need to fetch the whole millions to the peer and then sort and drop from them?

Was it helpful?

Solution 2

Yes, you would need to fetch all the data, since there's no index that would help you out here.

I would have created my own "index" and normalized this data. You can have a separate set of N entities where you keep as many as you'd like. You could start with 10, or consider storing 100 to trade some (possibly negligible) speed for more flexibility. This index can be stored on a separate "singleton" entity that you add as part of your schema.

 ;; The attribute that stores the index
 {:db/id #db/id[:db.part/db]
  :db/ident :indexed-articles
  :db/valueType :db.type/ref
  :db/cardinality :db.cardinality/many
  :db.install/_attribute :db.part/db}

 ;; The named index entity.
 {:db/id #db/id[:db.part/db]
  :db/ident :articles-index}

You can have a database function that does this. Every time you insert a new entity that you want to "index", call this function.

[[:db/add tempid :article/title "Foo]
 [:db/add tempid :article/date ....]
 [:index-article tempid 10]]

The implementation of index-article could look like this:

 {:db/id #db/id[:db.part/user]
  :db/ident :index-article
  :db/fn #db/fn {:lang "clojure"
                 :params [db article-id idx-size]
                 :code (concat
                        (map
                         (fn [article]
                           [:db/retract
                            (d/entid db :articles-index)
                            :indexed-articles
                            (:db/id article)])
                         (->> (datomic.api/entity db :articles-index)
                              (sort-by (fn [] ... implement me ... ))
                              (drop (dec idx-size))))
                        [[:db/add (d/entid db :articles-index) :indexed-articles article-id]])}}

Disclaimer: I haven't actually tested this function, so it probably contains errors :) The general idea is that we remove any "overflow" entities from the set, and add the new one. When idx-size is 10, we want to ensure that only 9 items are in the set, and we add our new item to it.

Now you have an entity you can lookup from index, :articles-index, and the 10 most recent articles can be looked up from the index (all refs are indexed), without causing a full database read.

;; "indexed" set of articles.
(d/entity db :articles-index)

OTHER TIPS

Until getting hold of the reverse index becomes a Datomic feature, you could manually define one.

e.g. for a :db.type/instant, create an additional attribute of type :db.type/long which you would fill with

(- (Long/MAX_VALUE) (.getTime date))

and the latest 10 articles could be fetched with

(take 10 (d/index-range db reverse-attr nil nil))

I've been looking into this and think I have a slightly more elegant answer.

Declare your attribute as indexed with :db/index true

{:db/id #db/id[:db.part/db -1]
 :db/ident :ocelot/number
 :db/valueType :db.type/long
 :db/cardinality :db.cardinality/one
 :db/doc "An ocelot number"
 :db/index true
 :db.install/_attribute :db.part/db}

This ensures that the attribute is included in the AVET index.

Then the following gives you access to the "top ten", albeit using the low-level datoms call.

(take-last 10 (d/datoms (db conn) :avet :ocelot/number))

Obviously if you need to do any further filtering ("who are the top ten scorers in this club ?") then this approach won't work, but at that point you have a much smaller amount of data in your hand and shouldn't need to worry about the indexing.

I did look extensively at the aggregation functions available from Datalog and am having trouble getting my head around them - and am uncertain that e.g. max would use this index rather than a full scan of the data. Similarly the (index-range ...) function almost certainly does use this index but requires you to know the start and/or end values.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top