Вопрос

We have been using Graphite for a while now and recently changed the source of some metrics from statsd to yammer/codahale-metrics. Since our metrics generally are sent from a number of different servers, we set up Graphite's own aggregator to handle that for us.

Now the problem is that the stats for individual servers show up and behave just fine, but the aggregated stats will always only be correct for the last one hour or so. Meaning that older aggregated values are somehow modified after some time. Here's an image of what it looks like: graph The green line is just a sumSeries on the metrics that should have been aggregated, the blue line is what the aggregator generated. Note how both lines are harmonizing only in the past hour.

Of course we have looked into storage/aggregation/retention rules, but they are all really basic and should cover all metrics equally (and basically not even be in effect after just 1 hour):

storage-schemas.conf

[stats]
priority = 110
pattern = .*
# store 60s for 30d, then 15 minutes 350400 (10 years)
retentions = 60:43000,900:262974

storage-aggregation.conf

[kv]
pattern = \.kv\.
xFilesFactor = 0.2
aggregationMethod = average 

[counts]
pattern = \.counts\.
xFilesFactor = 0
aggregationMethod = sum

[timers]
pattern = \.timers\.
xFilesFactor = 0.2
aggregationMethod = average


[min]
pattern = \.min$
xFilesFactor = 0.1
aggregationMethod = min

[max]
pattern = \.max$
xFilesFactor = 0.1
aggregationMethod = max

[sum]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum

[default_average]
pattern = .*
xFilesFactor = 0.5
aggregationMethod = average

The configuration of the actual aggregator is probably the blind spot here, since we couldn't find any really detailed documentation and just left everything as it was, mostly.

carbon.conf

[aggregator]
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2023

PICKLE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_PORT = 2024

DESTINATIONS = 127.0.0.1:2004

REPLICATION_FACTOR = 1

MAX_QUEUE_SIZE = 10000

USE_FLOW_CONTROL = True

MAX_DATAPOINTS_PER_MESSAGE = 500

MAX_AGGREGATION_INTERVALS = 5
Это было полезно?

Решение

It looks like you ran into an issue that exists in the latest release (0.9.12) of graphite and that was reported to the project's bugtracker at https://github.com/graphite-project/carbon/issues/109 .

The bug report also mentions a potential fix for the issue.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top