Вопрос

I am trying to model statistics to submit to statsd/graphite. However what I am monitoring is "session" centric. For example, I have a game that is played in real time. There are multiple instances of a game active on the servers. Each game has multiple (and variable number of) participants. Each instance of a game has a unique ID as does each player. I want to track (and graph) each player's stats but then roll the metric up for the whole instance and then for all the instances of a game. For example there may be two instances of a game active at a given time. Lets say each has two players in the game

GameTitle.RealTime.VoiceErrors.game_instance_a.player_id_1 10
GameTitle.RealTime.VoiceErrors.game_instance_a.player_id_2 20
GameTitle.RealTime.VoiceErrors.game_instance_b.player_id_3 50
GameTitle.RealTime.VoiceErrors.game_instance_b.player_id_4 70

where game_instances and player_ids are 128 bit numbers

And I want to be able to see that the value of all voice errors for game_instance_a is 30 while all voice errors across the system is 150

Given this I have three questions

  1. What guidance would you have on naming the metrics.
  2. Is it kosher to have metrics that have "dynamic" identifiers as part of the name
  3. What are they scale limits on this. If I had a 100K game instances with say as many as 1000 players in a game, is this going to kill statsd/graphite?

Thanks!

Это было полезно?

Решение

What guidance would you give on naming the metrics?

Graphite recommends that "Volatile path components should be kept as deep into the hierarchy as possible". This essentially means that if you can push the parts of the metrics that are frequently unique to the end of the "bucket" without impacting your grouping queries you should try to do so.

Here is a great post on using Graphite that includes naming recommendations. And here is another one with additional info from Jason Dixon (an excellent source for Graphite stuff in general).

Is it kosher to have metrics that have "dynamic" identifiers as part of the name?

I usually try to avoid identifiers in the metric names unless they are very low in number (<100). Because Graphite will store a .wsp file for every metric name you'll have a difficult time re-sizing or adjusting the storage settings should you decide to change your configuration. Additionally, the Graphite UI will have a "folder" for every metric name so you can easily make the UI unusable.

In your case, I'd probably graph the total number of game instances, the total number of players, and the number of errors (by type), etc. Additionally, I might try to track players per instance (generally) and maybe errors per instance (again without knowing the actual instance. e.g. GameTitle.RealTime.PerInstance.VoiceErrors) if I had that capability (i.e. state stored per instance in my application).

Logstash, Elastic Search, Kibana

I'd suggest logging this error information with instance and player ids and using logstash to send your logs to elastic search and kibana. Then I'd watch Graphite for real time error and health anomaly detection and use Kibana (and Elastic Search underneath) to dig deeper.

What are the scale limits on this. If I had a 100K game instances with say as many as 1000 players in a game, is this going to kill statsd/graphite?

Statsd should have no problem with this, as it just acts as a -mostly- dumb aggregator. While it does maintain some state internally I don't anticipate a problem.

I don't think you'll have problems with the internal Graphite Whisper Storage itself, as it is just using files and folders. But, as I mentioned above, the Graphite Web UI will be unusable and I think you'll also run the risk of other manageability issues.

Summary

Keep the volatile (dynamic) metric buckets at the end of the name and avoid going above a couple hundred of these.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top