I would do this by using a protocol:
(defprotocol Sample
(sample [m]))
Then extend the protocol to whatever structures you want to sample from in the following way:
- Maps, records (and any other assciative data types): return the same type with the same keys and the result of calling
sample
on each value
- Sets: select an element from the set at random
- Numerical types (java.lang.Number): return the value unchanged
- Function types (IFn): call the function with 0 arguments
- Anything else (java.lang.Object): return the value unchanged (or an error, if you like...)
Now you can do stuff like:
(sample [#{1 2} (partial rand-int 10) {:a 1 :b #{5 6}}])
=> [2 7 {:a 1 :b 6}]
Advantages of this approach:
- You can define an immutable "schema" for producing samples
- After the sample is created, it is a pure immutable Clojure data structure (this is good since you don't want the result to change each time you read it!)
- You can easily extend it to new types of random sampling in the future by extending the protocol further, or by creating new sampling functions
- It's easy to compose with higher-order functions. For example you can do
(take 1000 (repeatedly #(sample my-schema)))
to get 1000 samples.
If you want to get more elaborate, you could also pass a seed as an additional optional argument to the sample
function. This would enable reproducibility of samples if you do it correctly (this is very useful for testing, and it makes (sample x seed)
work as a pure function).