Question

I have a bunch of records containing business names and I wish to do a query to find all the duplicates. How can this be done?

{:business/name "<>"}
Was it helpful?

Solution

If you're trying to enforce uniqueness on the attribute value you should look at the :db/unique schema attribute instead.

To find the duplicated values and how often they repeat, use:

(->> (d/datoms db :aevt :business/name)
   (map :v)
   (frequencies)
   (filter #(> (second %) 1)))

which uses the datomic.api/datoms API to access the raw AEVT index to stream :business/name attribute values, calculate their frequency and filter them based on some criteria i.e. more than one occurrence. You can also achieve the same result using datalog and aggregation functions:

(->> (d/q '[:find (frequencies ?v)
      :with ?e
      :in $ ?a
      :where [?e ?a ?v]]
    db :business/name)
 (ffirst)
 (filter #(> (second %) 1)))

To find the entities with duplicated attribute values, use:

(->> (d/datoms db :aevt :business/name)
   (group-by :v)
   (filter #(> (count (second %)) 1))
   (mapcat second)
   (map :e))

which also leverages the d/datoms API to accomplish it. For a full code sample, including datalog implementations, see https://gist.github.com/a2ndrade/5641681

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top