The primary LDMS store plugins support:
Librabbitmq 0.8 based, no changes planned. Tracking of librabbitmq updates expected.
- Remove hard-coded limit on number of instances.
- Extended flush options to manage latency and debugging issues.
Ideas under discussion:
- Timestamping of set arrival at the store.
- Duplicate set instance detection (the same set instance arriving by distinct aggregation paths is silently stored twice).
- Conflicting schema detection (set instances with schema of same name and different content, result storage loss or silent error).
The schema conflict detection can be reduced making a metadata checksum at set creation and performing consistency checks at any storage plugin (such as csv) where a full set consistency constraint exists. Storage policy which cherry pick named metrics must search instances by metric name every time or must also enforce full set consistency.
The duplicate detection can be handled by individual stores keeping a hash of the set instance names and the last timestamp (or N timestamps in a network of aggregators with potentially out of order delivery) of data stored for that set instance. Any set with timestamp detected as already stored is a duplicate. As LDMS is also in the process of adding multiple set instance transport and collection, putting this logic in individual stores is redundant and error prone.
The store arrival timestamping has been requested by users concerned with data validation for collective (multinode) statistics. This my be better addressed by changes elsewhere in LDMS. E.g. The LDMS core might provide a service that collects current RTC from all nodes in the aggregation over as short a time as possible and publishes it as a data set (the sampler API does not support this collection activity). A custom store could generate a warning if any host clock is off by more than the length of time the scan took (or a small multiple thereof).
SOS is in rapid development, and the corresponding store is tracking it.
Production use of the flatfile store has led to a number of requested changes (below). These changes are sufficiently complicated that an alternately named store (store_var) is in development. The flatfile store will remain unchanged, so that existing production script use can continue per site until admins have time to switch.
- Flush controls to manage latency.
- Output only on change of metric.
- Optionally with heartbeat metric output on specified long interval.
- Output only of specific metrics.
- Excluding output of specific metrics.
- including producername, job id and component id, for single-job, and single-node use-cases.
- Output of rate, delta, or integral delta values.
- Periodic output at frequency lower than arrival, optionally with selectable statistics on suppressed data.
- Statistics: min, max, avg, miss-count, nonzero-count, min-nonzero, sum, time-weighted sum, dt
- Metric name aliasing.
- Rounded to nearest second time stamps (when requested by the user, who is also using long intervals).
- Check and log message (once) if a rail limit is observed.
- Rename file after close/rollover following a template string.
- Generation of splunk input schema.
- Handling of array metrics.