Overview

There are currently three stats system on "trunk" in Apache Traffic Server. I'd like to eliminate one of them, and deprecate another, leaving a single stats system used by all components. As it turns out, this "migration" had already started a long time ago, and I'd say 95% of the code is already on the standard stats and config system; librecords. I'm not arguing that librecords doesn't need some serious rework and improvements, but a step in the right direction is to convert all stats system into one, using librecords only. This document is a proposal for this cleanup.

The problem: plugin stats ("InkAPI")

The main problematic stats area is in the plugin APIs, which currently has one old (and rather poor) stats system, as well as a new attempt at a rewrite. The first should simply be deprecated for the v2.2 release, which is an easy task (see bug TS-266). The new rewrite of a stats system for InkAPI (lovingly referred to as "V2") is very scalable (10's of thousands of stats can be handled), but suffers from a few problems currently (these could obviously be fixed if time is spent on it):

  • It does not integrate with the existing librecords stats at all. This makes XML synthesized stats less than functional for example, i.e. you can't combine stats from the two systems using the already existing features of Apache TS.
  • It runs its own "control port" (TCP), and does not use the CLI as provided by librecords. This makes for a poor management experience (basically, two systems to manage the stats and configs).
  • It does not support persistent stats, so every restart will reset these V2 stats. This might not seem like a serious problem, but many counters really benefit from continuously increasing, and it can make stats collection much easier (i.e. you don't have to deal with the stats going from a large value to zero). It's also the way many of our current metrics work, where you can specify if a specific statistics should be persistent or not, preserving this functionality going forward is important IMO.
  • It does not support any "synchronization" handler other than the "SUM". librecords allows for specifying different synchronization handlers, where when the stats thread collects stats from the various threads, it can apply different logic on how the aggregation should be handled, e.g. SUM, AVG etc.' This is again a feature the ATS core uses heavily, and that we should continue to support.
  • V2 does not support the same "data types" for stats that librecord does. I'm a little torn on this issue, I think it might be acceptable to limit to only "integer" stats, if people have strong opinions on this, I'll certainly not oppose removing this feature from the new InkAPI proposal now, and from ATS core later.

I believe that creating new APIs for InkAPI that exposes the librecords APIs is the way to go, and it allows us to focus on one area further down the road to improve the stats and configuration system (which at least ought to be separated I think, if we're going to move along with the "virtual host" style configs). Making a new set of stable APIs is important (so we don't have to keep deprecating APIs constantly), if the underlying implementation changes the APIs should hopefully not.

My proposal will keep the "V2" code as is, but disabled by default. It can be enabled via a configure option for anyone already using it, and it also allows us to reuse the code and ideas that "V2" implements in a future rewrite of the librecords stats system.

Feature comparison

The following table summarizes the features (or lack thereof) for the various InkAPI "stats" features. Please see the section below on how librecords can be improved to implement some of the features the "V2" implementation adds. I firmly believe that adding these features to librecords will improve the ATS stats system in general, and also make the InkAPI stats APIs based on librecords very flexible. 

Proposal 1: Deprecate old InkAPI stats

This would simply mark all the old stats APIs as deprecated for the v2.2 release. These APIs include:

  inkapi INKStat INKStatCreate(const char *the_name, INKStatTypes the_type);
  inkapi INKReturnCode INKStatIntAddTo(INKStat the_stat, INK64 amount);
  inkapi INKReturnCode INKStatFloatAddTo(INKStat the_stat, float amount);
  inkapi INKReturnCode INKStatDecrement(INKStat the_stat);
  inkapi INKReturnCode INKStatIncrement(INKStat the_stat);
  inkapi INKReturnCode INKStatIntGet(INKStat the_stat, INK64 * value);
  inkapi INKReturnCode INKStatFloatGet(INKStat the_stat, float *value);
  inkapi INKReturnCode INKStatIntSet(INKStat the_stat, INK64 value);
  inkapi INKReturnCode INKStatFloatSet(INKStat the_stat, float value);
  inkapi INKCoupledStat INKStatCoupledGlobalCategoryCreate(const char *the_name);
  inkapi INKCoupledStat INKStatCoupledLocalCopyCreate(const char *the_name, INKCoupledStat global_copy);
  inkapi INKReturnCode INKStatCoupledLocalCopyDestroy(INKCoupledStat local_copy);
  inkapi INKStat INKStatCoupledGlobalAdd(INKCoupledStat global_copy, const char *the_name, INKStatTypes the_type);
  inkapi INKStat INKStatCoupledLocalAdd(INKCoupledStat local_copy, const char *the_name, INKStatTypes the_type);
  inkapi INKReturnCode INKStatsCoupledUpdate(INKCoupledStat local_copy);

Proposal 2: Make "V2" stats API experimental

Instead of just removing the V2 stats, I'd like to propose we leave the code as is, but make it compile time enabled (disabled by default). We'll also move the V2 APIs over to an "experimental" include file, to not expose them in ts/ts.h. These APIs include:

  inkapi INKReturnCode     INKStatCreateV2(const char *name, uint32_t *stat_num);
  inkapi INKReturnCode     INKStatIncrementV2(uint32_t stat_num, INK64 inc_by);
  inkapi INKReturnCode     INKStatIncrementByNameV2(const char *stat_name, INK64 inc_by);
  inkapi INKReturnCode     INKStatDecrementV2(uint32_t stat_num, INK64 dec_by);
  inkapi INKReturnCode     INKStatDecrementByNameV2(const char *stat_name, INK64 dec_by);
  inkapi INKReturnCode     INKStatGetCurrentV2(uint32_t stat_num, INK64 *stat_val);
  inkapi INKReturnCode     INKStatGetCurrentByNameV2(const char *stat_name, INK64 *stat_val);
  inkapi INKReturnCode     INKStatGetV2(uint32_t stat_num, INK64 *stat_val);
  inkapi INKReturnCode     INKStatGetByNameV2(const char *stat_name, INK64 *stat_val);

Going forward, if the "V2" implementation matures, and can replace librecords (both internal and for APIs), it would be good to rename / replace the APIs above with the API names from Proposal #3 (below). This avoids having to migrate to yet another stats API, so while reviewing this, make sure the proposed APIs below will suffice.

Proposal 3: Make new librecords based stats APIs

Finally, I'd like to propose adding a new set of APIs to ts/ts.h, exposing the librecords APIs to plugin writers. My proposed APIs are as follows:

  typedef enum
    {
      TS_ERROR = -1,
      TS_SUCCESS = 0
    } TSReturnCode;

  typedef enum
    {
      TS_STAT_TYPE_INT = 1,
      // The following are currently not supported.
      TS_STAT_TYPE_FLOAT,
      TS_STAT_TYPE_STRING,
      TS_STAT_TYPE_COUNTER,
    } TSStatDataType;

  typedef enum
    {
      TS_STAT_PERSISTENT = 1,
      TS_STAT_NON_PERSISTENT
    } TSStatPersistence;

  typedef enum
    {
      TS_STAT_SYNC_SUM = 0,
      TS_STAT_SYNC_COUNT,
      TS_STAT_SYNC_AVG,
      TS_STAT_SYNC_TIMEAVG,
    } TSStatSync;

  inkapi int TSStatCreate(const char *the_name, TSStatDataType the_type, TSStatPersistence persist, TSStatSync sync);

  inkapi TSReturnCode TSStatIntIncrement(int the_stat, INK64 amount);
  inkapi TSReturnCode TSStatIntDecrement(int the_stat, INK64 amount);
  // Currently not supported.
  // inkapi TSReturnCode TSStatFloatIncrement(int the_stat, float amount);
  // inkapi TSReturnCode TSStatFloatDecrement(int the_stat, float amount);

  inkapi TSReturnCode TSStatIntGet(int the_stat, INK64* value);
  inkapi TSReturnCode TSStatIntSet(int the_stat, INK64 value);
  // Currently not supported
  // inkapi TSReturnCode TSStatFloatGet(int the_stat, float* value);
  // inkapi TSReturnCode TSStatFloatSet(int the_stat, float value);

  inkapi int TSStatFindName(const char* name);

I'm using the "TS" prefix here for the APIs, to distinguish them from the old APIs. This also avoids a "rename" later on, but I'd be open to changing this to INK* if that is preferable. In addition, my proposal adds a new records.config setting:

 CONFIG proxy.config.stat_api.max_stats_allowed INT 512

This proposal exposes a few features of the internal librecords, but not everything. I believe the above would allow easy integration with existing stats, and also allow for plugin developers to use most features that the internal ATS core can use. This has the advantage of making it possible to move code out from the core and into a plugin, without losing  functionality or changing current behaviour. Implementing these in a future rewrite of the stats system would not be difficult, and if we decide to totally ditch these stats features in v3.0 or v4.0, we'd want new APIs anyways I think.

Performance with librecords

I've benchmarked the above APIs, incrementing 2,000 stats out of a pool of either 10,000 or 50,000 total stats, for every request (in a plugin). For the first test (10k total stats), the librecords based APIs are slightly faster than the V2 stats. With 50,000 stats, the V2 stats are about 10% faster. I attribute this difference to the more advanced synchronization handlers that librecords provides, but I haven't honestly spent a lot of time trying to optimize anything. 10% is not a huge difference, and 50,000 stats is an incredible amount of stats to carry. For a slightly more reasonable setup (10,000 stats, which is still huge), the performance is very compteitive.

Both the API proposed above and the V2 APIs beat the old (hopefully soon deprecated) APIs in both scalability and performance. The old APIs are limited to 253 stats as well.

Problems with librecords

I have an implementation for the APIs above that does work. There are four issues with my patch, that would need to be addressed long term (but I think it's OK for a v2.2 release). Most of these are very easy (a few lines of code).

  1. The librecords stats system has an internal (static) limitation of 3000 stats. This is easy to bump up to say 8,000 (which my patch does), but it does waste a little memory. There are some multi-dimensional arrays that allocates unnecessary "slots" I think, which we could optimize. These static arrays needs to be modified to be dynamically allocated (and preferably "optimized" to waste less memory), based on the proxy.config.stat_api.max_stats_allowed setting above. This is merely an issue of changing a static allocation (stats[8000]) to dynamic allocation at startup (malloc(8000 * sizeof(stats-bucket)).
  2. My patch disables the new stats for per hook / plugin counters (they are tied to enabling the V2 stats system). My suggestion would be to rewrite these areas to use the new stats APIs above, once it has been landed. Or even better, use the internal librecords APIs instead, at least where possible (we generally shouldn't use the public InkAPI interfaces in "core" code).
  3. I've noticed a noticeable (~30s) time to create 50,000 stats using the new APIs. it's fine for 10,000 stats (which takes less than 0.5 seconds), so it gets progressively worse as more stats are added. This should be examined for sure.
  4. The CLI needs to support listing all stats, preferably with substrings and regexes filtering.
  • No labels