Question

I believe it must be somehow caused by the way I set up statsd/graphite, however I cannot seem to figure out:

Often, when a new measure is created by sending timer values for different metrics in quick succession, statsd seems to only send some of the variations that it is supposed to create for each timer to graphite, e.g. if I do the following and let it create a few metrics:

while true;do echo "test$RANDOM:38|ms" | nc -w 1 -u localhost 8125;done

I end up with something like the following, i.e. data-files were created only for some of the metrics, but some are missing and are not showing up even after some time.

$ cd /opt/graphite/storage/whisper/stats/timers
$ ls test9304/
count.wsp  lower.wsp  mean_90.wsp  std.wsp  sum.wsp  upper.wsp
$ ls test31877/
count_ps.wsp  count.wsp  lower.wsp  mean_90.wsp  mean.wsp  std.wsp  sum_90.wsp  sum.wsp  upper_90.wsp  upper.wsp

It seems that these missing ones appear after the measure is sent again sometimes later, but it is somehow non-deterministic, which ones are created when.

So is there a reason for this? Some internal optimization or caching which only flushes things after a longer period than the advertised 10 seconds?

Was it helpful?

Solution

The answer about carbon configuration was in the right direction, however it did not explain why the things never showed up, even not in the Graphite Browser, which should take the cached data into account.

The actual cause was different, after some looking at configuration files, I found the following, which probably is more related to my problem:

> # Softly limits the number of whisper files that get created each minute.                                               
> # Setting this value low (like at 50) is a good way to ensure your graphite                                             
> # system will not be adversely impacted when a bunch of new metrics are                                                 
> # sent to it. The trade off is that it will take much longer for those metrics'                                         
> # database files to all get created and thus longer until the data becomes usable.                                      
> # Setting this value high (like "inf" for infinity) will cause graphite to create                                       
> # the files quickly but at the risk of slowing I/O down considerably for a while.                                       
> MAX_CREATES_PER_MINUTE = 50

This actually explains why sending lots of new measures to statsd did only create some, but in random order... It depended on minute-switchover for new measures to be actually created.

OTHER TIPS

Yes, carbon-cache (it's already in the name) does this kind of caching to not exhaust the I/O bandwidth and write multiple metrics together and the graphite webapp uses both, the files on disk and the data in RAM. You can even configure this behavior of carbon-cache, e.g. how many data points (over all metrices) it collects before flushing.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top