A number of samplers from v2 and v3 need updating or other change of status in v4 LDMS.
Here we catalog what’s happening and any key rationale.
General Sampler Problems Suspected or Observed
- Multiples of the same plugin — is this supposed to work? if so is it handled consistently ?
- Multiples of the same plugin — is failure handled correctly where it is supposed to?
- Multiple sets in the same plugin — is this supposed to work? …same questions as above.
Individual Sampler Status
- everything under cray_system_sampler
- everything under aries_mmr
- Job_info
- Job_info_slurm
- dvs_sampler
- Llnl/edac
- msr_interlagos
- meminfo
- array_example
- kgnilnd
- vmstat
- Procinterrupts – NOTE: This may want to be updated to use arrays
- procnetdev
- procnfs
- Procdiskstats – [runs, may need validation]
- Procstat
- Lnet_stats
- Lustre (all and within Cray)
- Sampler_atasmart [runs, may need validations]
- All_example
- clock
- fptrans
- synthetic
- Sysclassib – [validated on shaun; seeing no warnings about use of a static var in an inline function with gcc 4.8.5– syntax fix in MR826]
- Timer_base [NT]
- Hfclock [NT]
- Cray_power_sampler [NT]
procsensors [Ben to longer term investigate configurability of data sources; existing code is dead-machine specific]
BAA: I propose we update procsensors to take a configuration file defining what it should collect. The unattractive alternative is to put much of the lm_sensors logic into our code base. The lines of the configuration file will each be a list of key/val pairs parsable with our existing libraries. Values will include such items as filename of input, filename of label, scaling factor if conversion wanted, etc. Handling as arrays will be up to the interpretation of the key/value pairs. Each metric will be read from a file and individually timestamped, due to unknown an potentially wide variance in read times of sensors. May need to multithread the collection and may need to run less frequently than other samplers.
[AG sez: getting some version of sensors working sounds good!]
- Rapl
- Perfevent – has warning about ioctl
- Generic_sampler [NT] — NT says perhaps we should retire this
- Ldms_jobid – [BA maintaining]
- Papi – because I think this is getting replaced
- Switchx [NT]
- Test_sampler [NN]
- Tsampler [NT]
- Variable [BA update base]
Retired
- Hadoop – [BA archived MR821]
- Knc_sampler – [BA archived MR821]
- Power_sampler [BA archived MR821]