StatsD

StatsD is a simple network daemon that continuously receives metrics pushed over UDP and periodically sends aggregate metrics to upstream services like Graphite and Librato Metrics. Because it uses UDP, clients (for example, web applications) can ship metrics to it very fast with little to no overhead. This means that a user can capture multiple metrics for every request to a web application, even at a rate of thousands of requests per second. Request-level metrics are aggregated over a flush interval (default 10 seconds) and pushed to an upstream metrics service.

Librato maintains a StatsD backend module that will push aggregate StatsD metrics to the Librato Metrics service. This document explains the concepts behind StatsD and how those map to Librato Metrics. Afterwards the reader should have a comfortable understanding of how to use StatsD to instrument their code and understand how those metrics are presented in Librato Metrics.

StatsD Concepts

Buckets

In StatsD, whether it’s a counter, a timing, or a gauge, each metric is placed in a bucket. A bucket is just a representative name for a metric that acts as the key used to store it. When using the statsd Librato backend, each bucket is used as the metric name when that bucket is submitted to Librato Metrics. Therefore, users should ensure that buckets follow the same naming conventions for Librato Metrics metric names and users should not use the same name for a counter and a gauge, for example.

Counting

One of the more basic use cases for StatsD is counting the number of times an event occurs. You might count every time a user signs in, every time an application controller is invoked, or number of elements received in POST requests. Using a statsd client you simply invoke the increment operation each time you want to count something. You can specify an integer increment value, or let it default to one, and the client will send that increment to the statsd daemon over UDP. Multiple entities (processes or servers) can increment a single counter and the statsd daemon will update the total count of the counter bucket. On each flush interval statsd will push the current value of each counter to the upstream metrics service.

When used with Librato’s backend, StatsD will push counter metrics to Librato as gauges. The gauge value will represent the aggregate of all increment and decrement operations since the last flush interval. For example, if you perform ten increment(1) operations within a flush interval, the gauge value will be ‘10’ for that interval of time. When the measurements are rolled up to the higher level resolutions, the rollups will still be an aggregate of all operations during that period of time.

Timing

Oftentimes you’ll want to record a sampling of a particular metric multiple times and track the average rate of those samplings over time. For example, you might track average response time to your web application by sampling how long each request takes. StatsD supports this type of sampling with the timing interface. While it is commonly used for tracking values that are actual times (e.g., nginx response time), it can also be used to sample any type of value, e.g. the byte size of every nginx response. Statsd clients send each sampled value to the statsd daemon.

Internally, statsd keeps each timing sample point until the next flush interval. At the flush interval, statsd will record the min, max, sum, and count for each timing bucket. Using the Librato backend, these values are composed into the complex gauge type and sent to Librato Metrics as a gauge.

Statsd also supports the capability to calculate an arbitrary number of percentiles for timing buckets. By default, the 90th percentile is automatically calculated for each timing bucket, but the user can override that with one or more percentile thresholds. The Librato backend will publish a complex gauge measurement for each defined percentile threshold, by suffixing the percentile onto the metric name. For example, for the metric name api-response-times, the 90th percentile will be sent as the additional metric api-response-times.90. The percentile gauge measurement will include the sum of all values below the percentile, the max value at the percentile, and the count number of samples below the percentile.

Gauges

Statsd also supports the concept of gauges. In Statsd, gauges are single metric readings that are recorded and published to the upstream service. They are different from timings in that only the last gauge reading in a single flush interval is sent to the upstream service. This can be useful if you want to track the current temperature or the current price of a stock where only the most recent version is required in a given flush interval. However, for anything that is request driven – requiring measuring samples on each request – the timing interface is preferable.

The Librato backend will submit statsd gauge reading as single-value Librato Metrics gauges. At higher rollups levels those single-value readings will be combined into complex gauges that can be used to track averages across the longer rollup periods.

Sets

Sets are a relatively new concept in recent versions of StatsD. Sets track the number of unique elements belonging to a group. This could be used to track the number of unique visitors to your site at any point in time. At each flush interval, the Librato statsd backend will push the number of unique elements in the set as a single gauge value. Be aware that if you are tracking the same set across multiple statsd servers, there is no guarantee of uniqueness across the individual statsd servers.

Installing Statsd + Librato

To install, follow the instructions located in the statsd-librato-backend README.

Global Tags

Our backend plugin offers basic tagging support for the metrics you submit to Librato. You can specify what tags you want to submit to Librato using the tags config in the librato configuration section of the StatsD config file:

{
  "librato" : {
    "tags": { "os" : "ubuntu", "host" : "production-web-server-1", ... }
  }
}

Stat Level Tags

We also support tags at the stat level should you need more detailed tagging. The naming syntax for submitting tags for each stat is:

my.metric#service=web,app_version=1.12:42|g

Starting with a #, you pass in a comma-separated list of tags and we parse out the tags and values. The above example would submit a gauge metric to Librato with a name of my.metric, a value of 42 and the tags service=web and app_version=1.12.

Please note that in order to use tags, the statsd config option keyNameSanitize must be set to false to properly parse tags out of your stat name.

Note

You can mix top-level tags and per measurement tags but be aware that per measurement tags will take precedence. In other words, if top-level tags are set, and a measurement also has a tag, we will only store the measurement tag, we will not store the top-level tag. The reason we don’t merge tags is to avoid an inadvertent increase in cardinality and thereby an increase in cost.

Reducing zero-fill measurements

Some Statsd backends require that metrics are reported at fixed frequencies. This means that by default Statsd will push zeroes for metrics that haven’t had an update during a flush interval. Librato doesn’t require that, so to reduce the number of measurements that Statsd will push to your Librato account (and hey, that’s extra cost) you can tell Statsd to not push idle stats. See this section of the README for instructions.

Upgrading from the older Statsd backend

If you are upgrading from the statsd-librato-backend before version 0.1.0, the default representation for counter metrics has changed. Starting with 0.1.0, statsd counters are now represented as Librato gauges by default. If you were using the default configuration prior to 0.1.0, then you may run into conflicts when you try to push statsd counter metrics to Librato as gauges. To fix this, you have two options: 1. Keep the prior behavior of sending statsd counters as Librato counters. Just set the countersAsGauges configuration variables to false in your statsd config. 2. After upgrading to 0.1.0, remove all counter metrics that were published by statsd. You can use the API pattern DELETE route to mass delete metrics. To delete only counter metrics, add the parameter metric_type=counter.