How can I efficiently sample a long time span in Splunk?

Question 1

I found a related discussion on sampling on the Splunk Answers page below.

http://answers.splunk.com/answers/3743/is-it-possible-to-get-a-sample-set-of-search-results-rather-than-the-full-search-results

An alternative to filtering by date_minute or date_second, is to filter events in a where clause using the _serial property or the random() function. For example,

* | where (_serial % 60) = 0 | ...

or

* | where (random() % 60) = 0 | ...

However, in both cases the search appears to do a full scan of the data. This may still be desirable if you need the flexibility and the result is feeding into a more expensive query. Otherwise, using the date_second approach is significantly faster because events are apparently indexed by that field. For example, the two queries above ran in 3m 20s on a subset of data, where the query below ran in 11s on the same data.

* date_second=0 | ...

Question 2

If you are trying to run a search and you are not satisfied with the performance of Splunk, then I would suggest you either report accelerate it or data model accelerate it. Or you can create your own tsidx files (created automatically by report and data model acceleration) with tscollect, then run tstats over it.

Question 3

Splunk now supports data sampling link to docs