LDMS Metric Definitions

From OVISWiki
Jump to: navigation, search

Cray LDMS metrics

Cray XC specific samplers are used on large machines like Trinity and Blue Waters. We hope some day to have a tabulation of the Cray metrics and references to the relevant Cray documentation here. In the meanwhile, Cray LDMS users will find many metrics have the same names and definitions as the Linux LDMS metrics.

Linux LDMS metrics

The production commodity clusters running LDMS generally collect Linux generic and file system specific metric sets. Most of the metrics tabulated here are documented in Linux man pages, the Linux kernel source, or the Lustre source and documentation.

Corrections and additions: If you find something that needs adding or revising in the following table, please mail the suggested text to ovis-help @ sandia.gov, Subject: ldms metrics catalog. This table is machine generated.


This main definitions table uses definitions that are wild-card and constant-macro expandable to handle multiple devices. The values of the expansions are defined in other tables that follow and are generally cluster-specific. An explanation of the main table headings follows the tables.

Metrics table
Sampler Name description wildcard Mode Units Dimension Use Rate Only Delta/Job Useful Local rate UB UB Local max by dt itype alarm ratealarm collmax plotmax typical notes: kind wildrange
sampler column_html_name description wildcard mode units dimension userate usedelta l_rate_ub ublocal maxbydt itype alarm ratealarm collmax plotmax typical notes kind wildrange
none column_default (none yet) FALSE integral none none TRUE TRUE none none 0 rolls -1 -1 0 0 0 uint64
General #Time wall clock time FALSE integral second time TRUE TRUE 1 none 0 infinite -1 -1 0 1 0 double
General Time wall clock time FALSE integral second time TRUE TRUE 1 none 0 infinite -1 -1 0 1 0 double
General component_id data source numeric id FALSE label none none FALSE FALSE none none 0 infinite -1 -1 0 0 0 uint64
General jobid resource manager job id FALSE label none none FALSE FALSE none none 0 infinite -1 -1 0 0 0 uint64
General uid UNIX numeric user id FALSE label none none FALSE FALSE none none 0 infinite -1 -1 0 0 0 uint64
General username user login name FALSE label none none FALSE FALSE none none 0 infinite -1 -1 0 0 0 fixedstring
General job_id resource manager job id FALSE label none none FALSE FALSE none none 0 infinite -1 -1 0 0 0 uint64
General app_id application id (cray) FALSE label none none FALSE FALSE none none 0 infinite -1 -1 0 0 0 uint64
Lustre2_client lstats.inode_permission# calls to function for $device TRUE integral calls call TRUE TRUE none none 0 infinite -1 -1 0 60000 200 uint64 LUSTRELIST
Lustre2_client lstats.removexattr# calls to function for $device TRUE integral calls call TRUE TRUE none none 0 infinite -1 -1 0 10 1 uint64 LUSTRELIST
Lustre2_client lstats.listxattr# calls to function for $device TRUE integral calls call TRUE TRUE none none 0 infinite -1 -1 0 100 1 uint64 LUSTRELIST
Lustre2_client lstats.getxattr# calls to function for $device TRUE integral calls call TRUE TRUE none none 0 infinite -1 -1 0 25000 11000 uint64 LUSTRELIST
Lustre2_client lstats.setxattr# calls to function for $device TRUE integral calls call TRUE TRUE none none 0 infinite -1 -1 0 10 1 uint64 LUSTRELIST
Lustre2_client lstats.alloc_inode# calls to function for $device TRUE integral calls call TRUE TRUE none none 0 infinite -1 -1 0 200 1 uint64 LUSTRELIST
Lustre2_client lstats.statfs# calls to function for $device TRUE integral calls call TRUE TRUE none none 0 infinite -1 -1 0 100 1 uint64 LUSTRELIST
Lustre2_client lstats.rename# calls to function for $device TRUE integral calls call TRUE TRUE none none 0 infinite -1 -1 0 100 1 uint64 LUSTRELIST
Lustre2_client lstats.mknod# calls to function for $device TRUE integral calls call TRUE TRUE none none 0 infinite -1 -1 0 100 1 uint64 LUSTRELIST
Lustre2_client lstats.rmdir# calls to function for $device TRUE integral calls call TRUE TRUE none none 0 infinite -1 -1 0 100 1 uint64 LUSTRELIST
Lustre2_client lstats.mkdir# calls to function for $device TRUE integral calls call TRUE TRUE none none 0 infinite -1 -1 0 100 1 uint64 LUSTRELIST
Lustre2_client lstats.symlink# calls to function for $device TRUE integral calls call TRUE TRUE none none 0 infinite -1 -1 0 100 1 uint64 LUSTRELIST
Lustre2_client lstats.unlink# calls to function for $device TRUE integral calls call TRUE TRUE none none 0 infinite -1 -1 0 100 1 uint64 LUSTRELIST
Lustre2_client lstats.link# calls to function for $device TRUE integral calls call TRUE TRUE none none 0 infinite -1 -1 0 100 1 uint64 LUSTRELIST
Lustre2_client lstats.create# calls to function for $device TRUE integral calls call TRUE TRUE none none 0 infinite -1 -1 0 10 1 uint64 LUSTRELIST
Lustre2_client lstats.getattr# calls to function for $device TRUE integral calls call TRUE TRUE none none 0 infinite -1 -1 0 500 20 uint64 LUSTRELIST
Lustre2_client lstats.flock# calls to function for $device TRUE integral calls call TRUE TRUE none none 0 infinite -1 -1 0 100 1 uint64 LUSTRELIST
Lustre2_client lstats.truncate# calls to function for $device TRUE integral calls call TRUE TRUE none none 0 infinite -1 -1 0 10 1 uint64 LUSTRELIST
Lustre2_client lstats.setattr# calls to function for $device TRUE integral calls call TRUE TRUE none none 0 infinite -1 -1 0 100 1 uint64 LUSTRELIST
Lustre2_client lstats.readdir# calls to function for $device TRUE integral calls call TRUE TRUE none none 0 infinite -1 -1 0 100 1 uint64 LUSTRELIST
Lustre2_client lstats.fsync# calls to function for $device TRUE integral calls call TRUE TRUE none none 0 infinite -1 -1 0 10 1 uint64 LUSTRELIST
Lustre2_client lstats.seek# calls to function for $device TRUE integral calls call TRUE TRUE none none 0 infinite -1 -1 0 1000 20 uint64 LUSTRELIST
Lustre2_client lstats.mmap# calls to function for $device TRUE integral calls call TRUE TRUE none none 0 infinite -1 -1 0 1000 1 uint64 LUSTRELIST
Lustre2_client lstats.close# calls to function for $device TRUE integral calls call TRUE TRUE none none 0 infinite -1 -1 0 500 10 uint64 LUSTRELIST
Lustre2_client lstats.open# calls to function for $device TRUE integral calls call TRUE TRUE none none 0 infinite -1 -1 0 500 10 uint64 LUSTRELIST
Lustre2_client lstats.ioctl# calls to function for $device TRUE integral calls call TRUE TRUE none none 0 infinite -1 -1 0 500 10 uint64 LUSTRELIST
Lustre2_client lstats.dirty_pages_hits# dirty_pages_hits for $device TRUE integral events event TRUE TRUE none 0 rolls -1 -1 0 0 0 uint64 LUSTRELIST
Lustre2_client lstats.dirty_pages_misses# dirty_pages_misses for $device TRUE integral events event TRUE TRUE none 0 rolls -1 -1 0 0 0 uint64 LUSTRELIST
Lustre2_client lstats.brw_write# bytes written in bandwidth test mode on $device TRUE integral pages data TRUE TRUE IB40G IB40G 0 rolls -1 -1 LWRITEBW 4000000000 5000000 units may be bytes; need validation uint64 LUSTRELIST
Lustre2_client lstats.brw_read# bytes read in bandwidth test mode $device TRUE integral pages data TRUE TRUE IB40G IB40G 0 rolls -1 -1 LWRITEBW 4000000000 5000000 units may be bytes; need validation uint64 LUSTRELIST
Lustre2_client lstats.osc_write# TRUE integral bytes data TRUE TRUE IB40G IB40G 0 rolls -1 -1 0 4000000000 5000000 uint64 LUSTRELIST
Lustre2_client lstats.osc_read# TRUE integral bytes data TRUE TRUE IB40G IB40G 0 rolls -1 -1 0 4000000000 5000000 uint64 LUSTRELIST
Lustre2_client lstats.write_bytes# bytes written (which may be to network or local cache) on $device TRUE integral bytes data TRUE TRUE IB40G IB40G 0 rolls -1 -1 LWRITEBW 20000000 5000000 uint64 LUSTRELIST
Lustre2_client lstats.read_bytes# bytes read (which may be from cache or network) on $device TRUE integral bytes data TRUE TRUE IB40G IB40G 0 rolls -1 -1 LWRITEBW 20000000 5000000 uint64 LUSTRELIST
Meminfo DirectMap1G kB mapped in 1G chunks FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM HWMEM uint64
Meminfo DirectMap2M kB mapped in 2M chunks FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM HWMEM uint64
Meminfo DirectMap4k kB mapped in 4k chunks FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM HWMEM uint64
Meminfo Hugepagesize The size for each hugepages unit in kilobytes FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM 2048 2048 uint64
Meminfo HugePages_Surp FALSE capacity pages data FALSE FALSE 0 HWMEM/HPSZ 0 infinite -1 -1 Cnode*HWMEM/HPSZ HWMEM/4 HWMEM/4 uint64
Meminfo HugePages_Rsvd FALSE capacity pages data FALSE FALSE 0 HWMEM/HPSZ 0 infinite -1 -1 Cnode*HWMEM/HPSZ HWMEM/4 HWMEM/4 uint64
Meminfo HugePages_Free The total number of hugepages available for the system. FALSE capacity pages data FALSE FALSE 0 HWMEM/HPSZ 0 infinite -1 -1 Cnode*HWMEM/HPSZ HWMEM/4 HWMEM/4 uint64
Meminfo HugePages_Total The total number of hugepages for the system. The number is derived by dividing Hugepagesize by the megabytes set aside for hugepages specified in /proc/sys/vm/hugetlb_pool. FALSE capacity pages data FALSE FALSE 0 HWMEM/HPSZ 0 infinite -1 -1 Cnode*HWMEM/HPSZ HWMEM/4 HWMEM/4 uint64
Meminfo AnonHugePages FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM HWMEM uint64
Meminfo HardwareCorrupted FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM HWMEM uint64
Meminfo VmallocChunk The largest contiguous block of memory, in kilobytes, of available virtual address space. FALSE capacity kB data FALSE FALSE 0 none 0 infinite -1 -1 0 33285996544 uint64
Meminfo VmallocUsed The total amount of memory, in kilobytes, of used virtual address space. FALSE capacity kB data FALSE FALSE 0 none 0 infinite -1 -1 0 33285996544 uint64
Meminfo VmallocTotal The total amount of memory, in kilobytes, of total allocated virtual address space. FALSE capacity kB data FALSE FALSE 0 none 0 infinite -1 -1 0 33285996544 uint64
Meminfo Committed_AS The total amount of memory, in kilobytes, estimated to complete the workload. This value represents the worst case scenario value, and also includes swap memory. FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM uint64
Meminfo CommitLimit FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM uint64
Meminfo WritebackTmp FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM uint64
Meminfo Bounce FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM uint64
Meminfo NFS_Unstable FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM uint64
Meminfo PageTables FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM uint64
Meminfo KernelStack FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM uint64
Meminfo SUnreclaim FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM uint64
Meminfo SReclaimable FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM uint64
Meminfo Slab The total amount of memory, in kilobytes, used by the kernel to cache data structures for its own use. FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM uint64
Meminfo Shmem FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM uint64
Meminfo Mapped The total amount of memory, in kilobytes, which have been used to map devices, files, or libraries using the mmap command. FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM uint64
Meminfo AnonPages Active(Anon)+Active(File) FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM uint64
Meminfo Writeback The total amount of memory actively being written back to the disk. FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM uint64
Meminfo Dirty The total amount of memory waiting to be written back to the disk. FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM uint64
Meminfo SwapFree The total amount of swap free. FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM uint64
Meminfo SwapTotal The total amount of swap available, in kilobytes. FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM uint64
Meminfo Mlocked Locked memory FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM uint64
Meminfo Unevictable Pages from shm_lock, vm lock, RAMFS, etc that cannot be evicted (reclaimed). FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM uint64
Meminfo Inactive(file) File buffer/page cache memory available for reclaim. FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM uint64
Meminfo Active(file) Active memory mapped to file or device. FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM uint64
Meminfo Inactive(anon) Inactive memory not mapped to file or device and available for reclaim. FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM uint64
Meminfo Active(anon) Active memory not mapped to file or device. FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM uint64
Meminfo Inactive The total amount of buffer or page cache memory, in kilobytes, that are free and available. This is memory that has not been recently used and can be reclaimed for other purposes. FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM uint64
Meminfo Active The total amount of buffer or page cache memory, in kilobytes, that is in active use. This is memory that has been recently used and is usually not reclaimed for other purposes. FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM uint64
Meminfo SwapCached The amount of swap, in kilobytes, used as cache memory. FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM uint64
Meminfo Cached The amount of physical RAM, in kilobytes, used as cache memory. FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM uint64
Meminfo Buffers The amount of physical RAM, in kilobytes, used for file buffers. FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM uint64
Meminfo MemFree The amount of physical RAM, in kilobytes, left unused by the system. FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM uint64
Meminfo MemTotal Total amount of physical RAM. FALSE capacity kB data FALSE FALSE 0 HWMEM 0 infinite -1 -1 Cnode*HWMEM HWMEM uint64
procnetdev tx_compressed#ib Compressed packets transmitted on $device TRUE integral compressed_packets data TRUE TRUE none 0 rolls -1 -1 0 1000 uint64 IpoibLIST
procnetdev tx_carrier#ib Carrier losses detected in transmit on $device TRUE integral errors event TRUE TRUE none 0 rolls -1 3 0 1000 uint64 IpoibLIST
procnetdev tx_colls#ib Transmit collision detected on $device TRUE integral errors event TRUE TRUE none 0 rolls -1 3 0 1000 uint64 IpoibLIST
procnetdev tx_fifo#ib Transmit FIFO buffer errors on $device TRUE integral errors event TRUE TRUE none 0 rolls -1 3 0 1000 uint64 IpoibLIST
procnetdev tx_drop#ib Transmit packets dropped by $device TRUE integral packets data TRUE TRUE none 0 rolls -1 10 0 1000 uint64 IpoibLIST
procnetdev tx_errs#ib Total transmit errors detected on $device TRUE integral errors event TRUE TRUE none 0 rolls -1 3 0 1000 uint64 IpoibLIST
procnetdev tx_packets#ib Packets transmitted on $device TRUE integral packets data TRUE TRUE none 0 rolls -1 -1 0 IB40G/64 uint64 IpoibLIST
procnetdev tx_bytes#ib bytes transmitted on $device TRUE integral byte data TRUE TRUE IB40G none 0 rolls -1 -1 Cnode*IB40G IB40G uint64 IpoibLIST
procnetdev rx_multicast#ib multicast frames received by $device TRUE integral multicast_packets data TRUE TRUE none 0 rolls -1 -1 0 IB40G uint64 IpoibLIST
procnetdev rx_compressed#ib Compressed packets received by $device TRUE integral compressed_packets data TRUE TRUE none 0 rolls -1 -1 0 IB40G/64 uint64 IpoibLIST
procnetdev rx_frame#ib The frame error count on $device TRUE integral errors event TRUE TRUE none 0 rolls -1 3 0 1000 uint64 IpoibLIST
procnetdev rx_fifo#ib Received fifo buffer errors on $device TRUE integral errors event TRUE TRUE none 0 rolls -1 3 0 1000 uint64 IpoibLIST
procnetdev rx_drop#ib Packets dropped on $device TRUE integral packets data TRUE TRUE none 0 rolls -1 3 0 1000 uint64 IpoibLIST
procnetdev rx_errs#ib Receive errors on $device TRUE integral errors event TRUE TRUE none 0 rolls -1 3 0 1000 uint64 IpoibLIST
procnetdev rx_packets#ib Packets received on $device TRUE integral packets data TRUE TRUE none 0 rolls -1 -1 0 IB40G/64 uint64 IpoibLIST
procnetdev rx_bytes#ib Bytes received on $device TRUE integral bytes data TRUE TRUE IB40G none 0 rolls -1 -1 Cnode*IB40G IB40G uint64 IpoibLIST
procnetdev tx_compressed#eth Compressed packets transmitted on $device TRUE integral compressed_packets data TRUE TRUE none 0 rolls -1 -1 0 1000 uint64 ETHLIST
procnetdev tx_carrier#eth Carrier losses detected in transmit on $device TRUE integral errors event TRUE TRUE none 0 rolls -1 3 0 1000 uint64 ETHLIST
procnetdev tx_colls#eth Transmit collision detected on $device TRUE integral errors event TRUE TRUE none 0 rolls -1 3 0 1000 uint64 ETHLIST
procnetdev tx_fifo#eth Transmit FIFO buffer errors on $device TRUE integral errors event TRUE TRUE none 0 rolls -1 3 0 1000 uint64 ETHLIST
procnetdev tx_drop#eth Transmit packets dropped by $device TRUE integral packets data TRUE TRUE none 0 rolls -1 10 0 1000 uint64 ETHLIST
procnetdev tx_errs#eth Total transmit errors detected on $device TRUE integral errors event TRUE TRUE none 0 rolls -1 3 0 1000 uint64 ETHLIST
procnetdev tx_packets#eth Packets transmitted on $device TRUE integral packets data TRUE TRUE none 0 rolls -1 -1 0 ETH1G/64 uint64 ETHLIST
procnetdev tx_bytes#eth bytes transmitted on $device TRUE integral byte data TRUE TRUE ETH1G none 0 rolls -1 -1 Cnode*ETH1G ETH1G uint64 ETHLIST
procnetdev rx_multicast#eth multicast frames received by $device TRUE integral multicast_packets data TRUE TRUE none 0 rolls -1 -1 0 ETH1G uint64 ETHLIST
procnetdev rx_compressed#eth Compressed packets received by $device TRUE integral compressed_packets data TRUE TRUE none 0 rolls -1 -1 0 ETH1G/64 uint64 ETHLIST
procnetdev rx_frame#eth The frame error count on $device TRUE integral errors event TRUE TRUE none 0 rolls -1 3 0 1000 uint64 ETHLIST
procnetdev rx_fifo#eth Received fifo buffer errors on $device TRUE integral errors event TRUE TRUE none 0 rolls -1 3 0 1000 uint64 ETHLIST
procnetdev rx_drop#eth Packets dropped on $device TRUE integral packets data TRUE TRUE none 0 rolls -1 3 0 1000 uint64 ETHLIST
procnetdev rx_errs#eth Receive errors on $device TRUE integral errors event TRUE TRUE none 0 rolls -1 3 0 1000 uint64 ETHLIST
procnetdev rx_packets#eth Packets received on $device TRUE integral packets data TRUE TRUE none 0 rolls -1 -1 0 ETH1G/64 uint64 ETHLIST
procnetdev rx_bytes#eth Bytes received on $device TRUE integral bytes data TRUE TRUE ETH1G none 0 rolls -1 -1 Cnode*ETH1G ETH1G uint64 ETHLIST
procnfs commit nfs client-side calls to $metric FALSE integral calls call TRUE TRUE none 0 infinite -1 -1 0 200 uint64
procnfs pathconf nfs client-side calls to $metric FALSE integral calls call TRUE TRUE none 0 infinite -1 -1 0 200 uint64
procnfs fsinfo nfs client-side calls to $metric FALSE integral calls call TRUE TRUE none 0 infinite -1 -1 0 200 uint64
procnfs fsstat nfs client-side calls to $metric FALSE integral calls call TRUE TRUE none 0 infinite -1 -1 0 200 uint64
procnfs readdirplus nfs client-side calls to $metric FALSE integral calls call TRUE TRUE none 0 infinite -1 -1 0 200 uint64
procnfs readdir nfs client-side calls to $metric FALSE integral calls call TRUE TRUE none 0 infinite -1 -1 0 200 uint64
procnfs link nfs client-side calls to $metric FALSE integral calls call TRUE TRUE none 0 infinite -1 -1 0 200 uint64
procnfs rename nfs client-side calls to $metric FALSE integral calls call TRUE TRUE none 0 infinite -1 -1 0 200 uint64
procnfs rmdir nfs client-side calls to $metric FALSE integral calls call TRUE TRUE none 0 infinite -1 -1 0 200 uint64
procnfs remove nfs client-side calls to $metric FALSE integral calls call TRUE TRUE none 0 infinite -1 -1 0 200 uint64
procnfs mknod nfs client-side calls to $metric FALSE integral calls call TRUE TRUE none 0 infinite -1 -1 0 200 uint64
procnfs symlink nfs client-side calls to $metric FALSE integral calls call TRUE TRUE none 0 infinite -1 -1 0 200 uint64
procnfs mkdir nfs client-side calls to $metric FALSE integral calls call TRUE TRUE none 0 infinite -1 -1 0 200 uint64
procnfs create nfs client-side calls to $metric FALSE integral calls call TRUE TRUE none 0 infinite -1 -1 0 200 uint64
procnfs write nfs client-side calls to $metric FALSE integral calls call TRUE TRUE none 0 infinite -1 -1 0 1000 uint64
procnfs read nfs client-side calls to $metric FALSE integral calls call TRUE TRUE none 0 infinite -1 -1 0 1000 uint64
procnfs readlink nfs client-side calls to $metric FALSE integral calls call TRUE TRUE none 0 infinite -1 -1 0 100 uint64
procnfs access nfs client-side calls to $metric FALSE integral calls call TRUE TRUE none 0 infinite -1 -1 0 100 uint64
procnfs lookup nfs client-side calls to $metric FALSE integral calls call TRUE TRUE none 0 infinite -1 -1 0 1000 uint64
procnfs setattr nfs client-side calls to $metric FALSE integral calls call TRUE TRUE none 0 infinite -1 -1 0 1000 uint64
procnfs getattr nfs client-side calls to $metric FALSE integral calls call TRUE TRUE none 0 infinite -1 -1 0 1000 uint64
procnfs retransmitts nfs client-side calls to $metric FALSE integral event event TRUE TRUE none 0 infinite -1 5 0 1000 here for naming bug in sampler code uint64
procnfs retransmits nfs client-side calls to $metric FALSE integral event event TRUE TRUE none 0 infinite -1 5 0 1000 uint64
procnfs numcalls total calls to all client functions FALSE integral calls call TRUE TRUE none 0 rolls -1 -1 0 1000 uint64
procstat guest_nice# time in jiffies since boot core $device spent in $metric TRUE integral jiffies time TRUE TRUE JIFSEC 1 infinite -1 -1 JIFSEC*Cnode 100 uint64 RANGE;0;Maxnhyper
procstat guest# time in jiffies since boot core $device spent in $metric TRUE integral jiffies time TRUE TRUE JIFSEC 1 infinite -1 -1 JIFSEC*Cnode 100 uint64 RANGE;0;Maxnhyper
procstat steal# time in jiffies since boot core $device spent in $metric TRUE integral jiffies time TRUE TRUE JIFSEC 1 infinite -1 -1 JIFSEC*Cnode 100 uint64 RANGE;0;Maxnhyper
procstat softirq# time in jiffies since boot core $device spent in $metric TRUE integral jiffies time TRUE TRUE JIFSEC 1 infinite -1 -1 JIFSEC*Cnode 100 uint64 RANGE;0;Maxnhyper
procstat irq# time in jiffies since boot core $device spent in $metric TRUE integral jiffies time TRUE TRUE JIFSEC 1 infinite -1 -1 JIFSEC*Cnode 100 uint64 RANGE;0;Maxnhyper
procstat iowait# time in jiffies since boot core $device spent in $metric TRUE integral jiffies time TRUE TRUE JIFSEC 1 infinite -1 -1 JIFSEC*Cnode 100 uint64 RANGE;0;Maxnhyper
procstat idle# time in jiffies since boot core $device spent in $metric TRUE integral jiffies time TRUE TRUE JIFSEC 1 infinite -1 -1 JIFSEC*Cnode 100 uint64 RANGE;0;Maxnhyper
procstat sys# time in jiffies since boot core $device spent in $metric TRUE integral jiffies time TRUE TRUE JIFSEC 1 infinite -1 -1 JIFSEC*Cnode 100 uint64 RANGE;0;Maxnhyper
procstat nice# time in jiffies since boot core $device spent in $metric TRUE integral jiffies time TRUE TRUE JIFSEC 1 infinite -1 -1 JIFSEC*Cnode 100 uint64 RANGE;0;Maxnhyper
procstat user# time in jiffies since boot core $device spent in $metric TRUE integral jiffies time TRUE TRUE JIFSEC 1 infinite -1 -1 JIFSEC*Cnode 100 uint64 RANGE;0;Maxnhyper
procstat guest_nice sum of jiffies across all cores spent in $metric FALSE integral jiffies time TRUE TRUE JIFSEC*Ncore 1 infinite -1 -1 JIFSEC*Cnode*Ncore 100*Ncore uint64
procstat guest sum of jiffies across all cores spent in $metric FALSE integral jiffies time TRUE TRUE JIFSEC*Ncore 1 infinite -1 -1 JIFSEC*Cnode*Ncore 100*Ncore uint64
procstat steal sum of jiffies across all cores spent in $metric FALSE integral jiffies time TRUE TRUE JIFSEC*Ncore 1 infinite -1 -1 JIFSEC*Cnode*Ncore 100*Ncore uint64
procstat softirq sum of jiffies across all cores spent in $metric FALSE integral jiffies time TRUE TRUE JIFSEC*Ncore 1 infinite -1 -1 JIFSEC*Cnode*Ncore 100*Ncore uint64
procstat irq sum of jiffies across all cores spent in $metric FALSE integral jiffies time TRUE TRUE JIFSEC*Ncore 1 infinite -1 -1 JIFSEC*Cnode*Ncore 100*Ncore uint64
procstat iowait sum of jiffies across all cores spent in $metric FALSE integral jiffies time TRUE TRUE JIFSEC*Ncore 1 infinite -1 -1 JIFSEC*Cnode*Ncore 100*Ncore uint64
procstat idle sum of jiffies across all cores spent in $metric FALSE integral jiffies time TRUE TRUE JIFSEC*Ncore 1 infinite -1 -1 JIFSEC*Cnode*Ncore 100*Ncore uint64
procstat sys sum of jiffies across all cores spent in $metric FALSE integral jiffies time TRUE TRUE JIFSEC*Ncore 1 infinite -1 -1 JIFSEC*Cnode*Ncore 100*Ncore uint64
dstat nice FALSE capacity priority priority FALSE FALSE int64
procstat user sum of jiffies across all cores spent in $metric FALSE integral jiffies time TRUE TRUE JIFSEC*Ncore 1 infinite -1 -1 JIFSEC*Cnode*Ncore 100*Ncore uint64
procstat cpu_enabled# 0 if core $device is downed TRUE capacity cores available FALSE FALSE 1 0 1 -1 -1 Cnode 1 uint64 RANGE;0;Maxnhyper
procstat cpu_enabled 0 if core is downed FALSE capacity cores available FALSE FALSE 1 0 1 -1 -1 Cnode 1 uint64
procstat procs_blocked processes waiting FALSE capacity processes process_count FALSE uint32_max 0 infinite -1 -1 0 32 uint32
procstat procs_running processes active FALSE capacity processes process_count FALSE uint32_max 0 infinite -1 -1 Cnode*Ncore 32 uint32
procstat processes processes started since boot FALSE integral processes process_count FALSE uint32_max 0 rolls -1 -1 Cnode*Ncore*1000 1000*Ncore uint32
procstat context_switches cpu context switches since boot (or rollover of counter) FALSE integral event event TRUE TRUE uint32_max 0 rolls -1 -1 Cnode*Ncore*1000 1000*Ncore uint32
procstat hwintr_count number of hardware interrupts since boot FALSE integral event event TRUE TRUE uint32_max 0 rolls -1 -1 0 0 uint32
procstat softirq_count number of softirq since boot FALSE integral event event TRUE TRUE uint32_max 0 rolls -1 -1 0 0 uint32
sysclassib ib.port_multicast_rcv_packets# multicast packets received on $device TRUE integral packets data TRUE TRUE uint32_max 0 saturates -1 -1 0 IB40G/64 uint64 IBLIST
sysclassib ib.port_multicast_xmit_packets# multicast packets sent via $device TRUE integral packets data TRUE TRUE uint32_max 0 saturates -1 -1 0 IB40G/64 uint64 IBLIST
sysclassib ib.port_unicast_rcv_packets# unicast packets received on $device TRUE integral packets data TRUE TRUE uint32_max 0 saturates -1 -1 0 IB40G/64 uint64 IBLIST
sysclassib ib.port_unicast_xmit_packets# unicast packets sent via $device TRUE integral packets data TRUE TRUE uint32_max 0 saturates -1 -1 0 IB40G/64 uint64 IBLIST
sysclassib ib.port_xmit_wait# The number of ticks during which the port selected by PortSelect had data to transmit but no data was sent during the entire tick either because of insufficient credits or because of lack of arbitration. TRUE integral ticks event TRUE TRUE uint32_max 0 saturates -1 -1 0 100000 uint64 IBLIST
sysclassib ib.port_rcv_packets# packets received on $device TRUE integral packets data TRUE TRUE uint32_max 0 saturates -1 -1 0 IB40G/64 uint64 IBLIST
sysclassib ib.port_xmit_packets# packets transmitted on$device TRUE integral packets data TRUE TRUE uint32_max 0 saturates -1 -1 0 IB40G/64 uint64 IBLIST
sysclassib ib.port_rcv_data# bytes received on $device TRUE integral byte data TRUE TRUE IB40G uint32_max 0 saturates -1 -1 IB40G*Cnode IB40G uint64 IBLIST
sysclassib ib.port_xmit_data# bytes transmitted on $device TRUE integral byte data TRUE TRUE IB40G uint32_max 0 saturates -1 -1 IB40G*Cnode IB40G uint64 IBLIST
sysclassib ib.VL15_dropped# Number of incoming VL15 packets dropped due to resource limitations (e.g., lack of buffers) in the port. TRUE integral errors event FALSE TRUE uint16_max 0 saturates 10 3 0 65535 uint64 IBLIST
sysclassib ib.excessive_buffer_overrun_errors# The number of times that OverrunErrors consecutive flow control update periods occurred on $device, each having at least one overrun error. TRUE integral errors event FALSE TRUE uint4_max 0 saturates 10 2 0 15 uint64 IBLIST
sysclassib ib.local_link_integrity_errors# The number of times that the count of local physical errors on $device exceeded the threshold specified by LocalPhyErrors TRUE integral errors event FALSE TRUE uint4_max 0 saturates 10 2 0 15 uint64 IBLIST
sysclassib ib.COUNTER_SELECT2_F# TRUE integral input none FALSE FALSE none 0 -1 -1 0 uint64 IBLIST
sysclassib ib.port_rcv_constraint_errors# Total number of packets received on the switch physical port that are discarded for the following reasons: (a) FilterRawInbound is true and packet is raw (b) PartitionEnforcementInbound is true and packet fails partition key check or IP version check. TRUE integral errors event FALSE TRUE uint8_max 0 saturates 10 2 0 255 uint64 IBLIST
sysclassib ib.port_xmit_constraint_errors# Total number of packets not transmitted from the switch physical port for the following reasons: (a) FilterRawOutbound is true and packet is raw, or (b) PartitionEnforcementOutbound is true and packet fails partition key check or IP version check. TRUE integral errors event FALSE TRUE uint8_max 0 saturates 10 2 0 255 uint64 IBLIST
sysclassib ib.port_xmit_discards# Total number of $device outbound packets discarded by the port because the port is down or congested. Reasons for this include: (a) Output port is not in the active state, or (b) Packet length exceeded NeighborMTU, or (c) Switch Lifetime Limit exceeded, or (d) Switch HOQ Lifetime Limit exceeded. This may also include packets discarded while in VLStalled State. TRUE integral packet congestion loss event FALSE TRUE uint16_max 0 saturates 10 5 0 65535 uint64 IBLIST
sysclassib ib.port_rcv_switch_relay_errors# Total number of packets received on $device that were discarded because they could not be forwarded by the switch relay. Reasons for this include: (a) DLID mapping (see the description of PortDLIDMappingErrors in Table 250 PortRcvErrorDetails on page 1045), or (b) VL mapping, or (c) Looping (output port = input port). TRUE integral errors event FALSE TRUE uint16_max 0 saturates 10 2 0 65535 uint64 IBLIST
sysclassib ib.port_rcv_remote_physical_errors# Total number of packets marked with the EBP delimiter received on $device. TRUE integral errors event FALSE TRUE uint16_max 0 saturates 10 2 0 65535 uint64 IBLIST
sysclassib ib.port_rcv_errors# Total number of packets containing an error that were received on $device. These errors include: (a) Local physical errors (ICRC, VCRC, LPCRC, and all physical errors that cause entry into the BAD PACKET or BAD PACKET DISCARD states of the packet receiver state machine), or (b) Malformed data packet errors (LVer, length, VL), or (c) Malformed link packet errors (operand, length, VL), or (d) Packets discarded due to buffer overrun. TRUE integral errors event FALSE TRUE uint16_max 0 saturates 10 2 0 65535 uint64 IBLIST
sysclassib ib.link_downed# The total number of times the $device Port Training state machine has failed the link error recovery process and downed the link. TRUE integral status available FALSE TRUE uint8_max 0 saturates 10 2 0 255 uint64 IBLIST
sysclassib ib.link_error_recovery# The total number of times the $device Port Training state machine has successfully completed the link error recovery process. TRUE integral errors event TRUE TRUE uint8_max 0 saturates -1 -1 0 255 uint64 IBLIST
sysclassib ib.symbol_error# The total number of minor link errors detected on one or more physical lanes. This includes 8B/10B coding violations and is typically an indication of a bit error on the line. TRUE integral errors event TRUE TRUE uint16_max 0 saturates 10 2 0 65535 uint64 IBLIST
vmstat thp_split number of hugepage splits FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat thp_collapse_alloc_failed Is incremented if khugepaged found a range of pages that should be collapsed into one huge page but failed the allocation. FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat thp_collapse_alloc Is incremented by khugepaged when it has found a range of pages to collapse into one huge page and has successfully allocated a new huge page to store the data. FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat thp_fault_fallback Is incremented if a page fault fails to allocate a huge page and instead falls back to using small pages. FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat thp_fault_alloc Is incremented every time a huge page is successfully allocated to handle a page fault. This applies to both the first time a page is faulted and for copy-on-write faults. FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat unevictable_pgs_mlockfreed FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat unevictable_pgs_stranded FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat unevictable_pgs_cleared FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat unevictable_pgs_munlocked FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat unevictable_pgs_mlocked FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat unevictable_pgs_rescued FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat unevictable_pgs_scanned FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat unevictable_pgs_culled FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat htlb_buddy_alloc_fail FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat htlb_buddy_alloc_success FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat compact_success is incremented if the system compacted memory and freed a huge page for use. FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat compact_fail is incremented if the system tries to compact memory but failed FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat compact_stall is incremented every time a process stalls to run memory compaction so that a huge page is free for use. FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat compact_pagemigrate_failed is incremented when the underlying mechanism for moving a page failed. FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat compact_pages_moved is incremented each time a page is moved. If this value is increasing rapidly, it implies that the system is copying a lot of data to satisfy the huge page allocation. It is possible that the cost of copying exceeds any savings from reduced TLB misses. FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat compact_blocks_moved is incremented each time memory compaction examines a huge page aligned range of pages. FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pgrotated FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat allocstall Number of direct reclaim calls since boot FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pageoutrun FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat kswapd_skip_congestion_wait FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat kswapd_high_wmark_hit_quickly FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat kswapd_low_wmark_hit_quickly FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat kswapd_inodesteal FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat kswapd_steal Number of pages reclaimed by kswapd since boot FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat slabs_scanned Number of scanned slap objects since boot FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pginodesteal Number of normal pages reclaimed via inode frees since boot FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat zone_reclaim_failed FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pgscan_direct_movable FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pgscan_direct_normal Number of normal pages reclaimed since boot FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pgscan_direct_dma32 Number of dma32 page reclaimed since boot FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pgscan_direct_dma Number of dma page reclaimed since boot FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pgscan_kswapd_movable FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pgscan_kswapd_normal Number of normal pages scanned by kswapd since boot FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pgscan_kswapd_dma32 Number of dma32 page scanned by kswapd since boot FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pgscan_kswapd_dma Number of dma page scanned by kswapd since boot FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pgsteal_movable FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pgsteal_normal Number of normal page steals since boot FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pgsteal_dma32 Number of dma32 page steals since boot FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pgsteal_dma Number of dma page steals since boot FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pgrefill_movable FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pgrefill_normal Number of normal page refills since boot FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pgrefill_dma32 Number of dma32 page refills since boot FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pgrefill_dma Number of dma page refills since boot FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pgmajfault Number of major page faults since boot FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pgfault Number of minor page faults since boot FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pgdeactivate Number of page deactivate since boot FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pgactivate Number of page activates since boot FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pgfree Number of page frees since boot FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pgalloc_movable movable zone pages in use FALSE integral pages data TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pgalloc_normal normal zone pages in use FALSE integral pages data TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pgalloc_dma32 dma32 zone pages in use FALSE integral pages data TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pgalloc_dma dma zone pages in use FALSE integral pages data TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pswpout swap pages out count FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pswpin swap pages in count FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pgpgout disk pages out since boot- possibly includes all io FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat pgpgin disk pages in since boot - possibly includes all io FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat nr_anon_transparent_hugepages FALSE integral event event TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat numa_other pages allocated in RAM attached to this cpu while code using the pages was running elsewhere. FALSE integral pages data TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat numa_local pages allocated in RAM attached to this cpu while code using the pages runs on the same cpu. FALSE integral pages data TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat numa_interleave page allocated under an interleave strategy FALSE integral pages data TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat numa_foreign small is better- large indicates we borrow RAM some hops away from this core FALSE integral pages data TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat numa_miss small rate is better- large indicate other cores borrow locally attached RAM FALSE integral pages data TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat numa_hit large rate is ok- this core allocated nearest ram HW FALSE integral pages data TRUE TRUE none 0 rolls -1 -1 0 uint64
vmstat nr_shmem FALSE capacity pages data FALSE FALSE HWMEM/4 0 rolls -1 -1 Cnode*HWMEM/4 uint64
vmstat nr_isolated_file FALSE capacity pages data FALSE FALSE HWMEM/4 0 rolls -1 -1 Cnode*HWMEM/4 uint64
vmstat nr_isolated_anon FALSE capacity pages data FALSE FALSE HWMEM/4 0 rolls -1 -1 Cnode*HWMEM/4 uint64
vmstat nr_writeback_temp small is better FALSE capacity pages data FALSE FALSE HWMEM/4 0 rolls -1 -1 Cnode*HWMEM/4 uint64
vmstat nr_vmscan_write FALSE capacity pages data FALSE FALSE HWMEM/4 0 rolls -1 -1 Cnode*HWMEM/4 uint64
vmstat nr_bounce FALSE capacity pages data FALSE FALSE HWMEM/4 0 rolls -1 -1 Cnode*HWMEM/4 uint64
vmstat nr_unstable unstable page count FALSE capacity pages data FALSE FALSE HWMEM/4 0 rolls -1 -1 Cnode*HWMEM/4 uint64
vmstat nr_kernel_stack small is better FALSE capacity pages data FALSE FALSE HWMEM/4 0 rolls -1 -1 Cnode*HWMEM/4 uint64
vmstat nr_page_table_pages small is better FALSE capacity pages data FALSE FALSE HWMEM/4 0 rolls -1 -1 Cnode*HWMEM/4 uint64
vmstat nr_slab_unreclaimable small is better FALSE capacity pages data FALSE FALSE HWMEM/4 0 rolls -1 -1 Cnode*HWMEM/4 uint64
vmstat nr_slab_reclaimable small is better FALSE capacity pages data FALSE FALSE HWMEM/4 0 rolls -1 -1 Cnode*HWMEM/4 uint64
vmstat nr_writeback count of pages scheduled out but not done. persistent value should be 0. FALSE capacity pages data FALSE FALSE HWMEM/4 0 rolls -1 -1 Cnode*HWMEM/4 uint64
vmstat nr_dirty count of pages waiting to be scheduled to output device. persistent value should be 0 FALSE capacity pages data FALSE FALSE HWMEM/4 0 rolls -1 -1 Cnode*HWMEM/4 uint64
vmstat nr_file_pages FALSE capacity pages data FALSE FALSE HWMEM/4 0 rolls -1 -1 Cnode*HWMEM/4 uint64
vmstat nr_mapped Number of pages mapped by files FALSE capacity pages data FALSE FALSE HWMEM/4 0 rolls -1 -1 Cnode*HWMEM/4 uint64
vmstat nr_anon_pages FALSE capacity pages data FALSE FALSE HWMEM/4 0 rolls -1 -1 Cnode*HWMEM/4 uint64
vmstat nr_mlock FALSE capacity pages data FALSE FALSE HWMEM/4 0 rolls -1 -1 Cnode*HWMEM/4 uint64
vmstat nr_unevictable FALSE capacity pages data FALSE FALSE HWMEM/4 0 rolls -1 -1 Cnode*HWMEM/4 uint64
vmstat nr_active_file FALSE capacity pages data FALSE FALSE HWMEM/4 0 rolls -1 -1 Cnode*HWMEM/4 uint64
vmstat nr_inactive_file FALSE capacity pages data FALSE FALSE HWMEM/4 0 rolls -1 -1 Cnode*HWMEM/4 uint64
vmstat nr_active_anon FALSE capacity pages data FALSE FALSE HWMEM/4 0 rolls -1 -1 Cnode*HWMEM/4 uint64
vmstat nr_inactive_anon FALSE capacity pages data FALSE FALSE HWMEM/4 0 rolls -1 -1 Cnode*HWMEM/4 uint64
vmstat nr_free_pages FALSE capacity pages data FALSE FALSE HWMEM/4 0 rolls -1 -1 Cnode*HWMEM/4 uint64
synthetic saw sawtooth wave series with compid offset FALSE integral none none TRUE TRUE 200 0 rolls -1 -1 1000 1000 20 dummy uint64
synthetic sine sine wave series with compid offset FALSE capacity none none FALSE FALSE 200 0 infinite -1 -1 1000 1000 20 dummy uint64
synthetic square square wave series with compid offset FALSE integral none none FALSE FALSE 200 0 infinite -1 -1 1000 1000 20 dummy uint64
dstat rchar characters read; The number of bytes which this task has caused to be read from storage. This is simply the sum of bytes which this process passed to read(2) and similar system calls. It includes things such as terminal I/O and is unaffected by whether or not actual physical disk I/O was required (the read might have been satisfied from pagecache). FALSE integral char read data TRUE TRUE infinite -1 -1 0 0 0 uint64
dstat wchar characters written; The number of bytes which this task has caused, or shall cause to be written to disk. Similar caveats apply here as with rchar. FALSE integral char write data TRUE TRUE infinite -1 -1 uint64
dstat syscr read syscalls; Attempt to count the number of read I/O operations: that is, system calls such as read(2) and pread(2). FALSE integral read syscalls call TRUE TRUE infinite -1 -1 uint64
dstat syscw write syscalls; Attempt to count the number of write I/O operations: that is, system calls such as write(2) and pwrite(2). FALSE integral write syscalls call TRUE TRUE infinite -1 -1 uint64
dstat read_bytes bytes read; Attempt to count the number of bytes which this process really did cause to be fetched from the storage layer. This is accurate for block-backed filesystems. FALSE integral char read data TRUE TRUE infinite -1 -1 local rates bound by ram/cpu/net somehow. uint64
dstat write_bytes bytes written; Attempt to count the number of bytes which this process caused to be sent to the storage layer. FALSE integral char write data TRUE TRUE infinite -1 -1 local rates bound by ram/cpu/net somehow. uint64
dstat cancelled_write_bytes The big inaccuracy here is truncate. If a process writes 1MB to a file and then deletes the file, it will in fact perform no writeout. But it will have been accounted as having caused 1MB of write. In other words: this field represents the number of bytes which this process caused to not happen, by truncating pagecache. A task can cause "negative" I/O too. If this task truncates some dirty pagecache, some I/O which another task has been accounted for (in its write_bytes) will not be happening. FALSE integral char unwrite data TRUE TRUE infinite -1 -1 uint64
dstat minflt FALSE integral Page faults event TRUE TRUE infinite -1 -1 uint64
dstat cminflt FALSE integral Page faults event TRUE TRUE infinite -1 -1 uint64
dstat majflt FALSE integral Page faults event TRUE TRUE infinite -1 -1 uint64
dstat cmajflt FALSE integral Page faults event TRUE TRUE infinite -1 -1 uint64
dstat utime FALSE integral jiffies time TRUE TRUE JIFSEC*Ncore 1 infinite -1 -1 JIFSEC*Cnode*Ncore 100*Ncore uint64
dstat stime FALSE integral jiffies time TRUE TRUE JIFSEC*Ncore 1 infinite -1 -1 JIFSEC*Cnode*Ncore 100*Ncore uint64
dstat cutime FALSE integral jiffies time TRUE TRUE JIFSEC*Ncore 1 infinite -1 -1 JIFSEC*Cnode*Ncore 100*Ncore int64
dstat cstime FALSE integral jiffies time TRUE TRUE JIFSEC*Ncore 1 infinite -1 -1 JIFSEC*Cnode*Ncore 100*Ncore int64
dstat priority FALSE capacity priority priority FALSE FALSE int64
dstat nice FALSE capacity priority priority FALSE FALSE int64
dstat num_threads FALSE capacity threads process_count FALSE FALSE uint64
dstat vsize FALSE capacity bytes data FALSE FALSE uint64
dstat rss FALSE capacity bytes data FALSE FALSE HWMEM Cnode*HWMEM uint64
dstat rsslim FALSE capacity bytes data FALSE FALSE HWMEM Cnode*HWMEM uint64
dstat signal FALSE label mask signals FALSE FALSE uint64
dstat processor FALSE label core FALSE FALSE uint64
dstat rt_priority FALSE label priority FALSE FALSE uint64
dstat policy FALSE label policy FALSE FALSE uint64
dstat delayacct_blkio_ticks FALSE integral jiffies time TRUE TRUE JIFSEC*Ncore 1 infinite -1 -1 JIFSEC*Cnode*Ncore 100*Ncore uint64
dstat VmSize FALSE capacity kB data FALSE FALSE uint64
dstat VmRSS FALSE capacity kB data FALSE FALSE uint64
dstat share_pages FALSE capacity pages data FALSE FALSE HWMEM/4 uint64
dstat text_pages FALSE capacity pages data FALSE FALSE HWMEM/4 uint64
dstat lib_pages FALSE capacity pages data FALSE FALSE HWMEM/4 uint64
dstat data_pages FALSE capacity pages data FALSE FALSE HWMEM/4 uint64
dstat dirty_pages FALSE capacity pages data FALSE FALSE HWMEM/4 uint64
dstat mmalloc_bytes_used_p_holes FALSE capacity bytes data FALSE FALSE HWMEM*1024 uint64
dstat pid FALSE label pid data FALSE FALSE uint64
dstat ppid FALSE label ppid data FALSE FALSE uint64
lnet_stats msgs_alloc FALSE capacity msg
lnet_stats msgs_max FALSE capacity msg
lnet_stats errors FALSE integral event
lnet_stats send_count FALSE integral
lnet_stats recv_count FALSE integral
lnet_stats route_count FALSE integral
lnet_stats drop_count FALSE integral
lnet_stats send_length FALSE capacity
lnet_stats recv_length FALSE capacity
lnet_stats route_length FALSE capacity
lnet_stats drop_length FALSE capacity

Constants all table
category DESCRIPTION VALUE UNITS NAME
Rate Limits Link 10Gb 1073741824 byte/sec IB10G
Link-1Gb 134217728 byte/sec ETH1G
Link-40Gb 4294967296 byte/sec IB40G
Link-20Gb 2147483648 byte/sec IB20G
0 none
Unsigned limits 16 uint4_max
256 uint8_max
65536 uint16_max
4294967296 uint32_max
18446744073709551616 uint64_max
1 TRUE
0 FALSE
Conversions kilo 1024 CFKILO
mega 1048576 CFMEGA
giga 1073741824 CFGIGA
tera 1099511627776 CFTERA

Clusters table
cluster node login gateway admin gpu qs
jemez
pecos pecos%n pecos-login%n
cayenne cay%n cayenne-login%n caygw%n cayadmin%n
solo
uno uno%n uno-login%n unogw%n unoadmin%n unogpu%n unoqs%n
chama chama%n chama-login%n chgw%n chadmin%n
skybridge sb%n skybridge-login%n sbgw%n sbadmin%n
serrano ser%n serrano-login%n sergw%n seradmin%n
cts1x c1x%n cts1x-login%n c1xgw%n c1xadmin%n
ghost gho%n ghost-login%n ghogw%n ghoadmin%n
doom doom%n doom-login%n doomgw%n doomadmin%n
mayer cn%n

Constants chama;chama-node;chama-login table
category DESCRIPTION VALUE UNITS NAME
Rate Limits Link 10Gb 1073741824 byte/sec IB10G
Link-1Gb 134217728 byte/sec ETH1G
Link-40Gb 4294967296 byte/sec IB40G
Link-20Gb 2147483648 byte/sec IB20G
0 none
Unsigned limits 16 uint4_max
256 uint8_max
65536 uint16_max
4294967296 uint32_max
18446744073709551616 uint64_max
1 TRUE
0 FALSE
Conversions kilo 1024 CFKILO
mega 1048576 CFMEGA
giga 1073741824 CFGIGA
tera 1099511627776 CFTERA
HW Limits Node-memory 67108864 kb HWMEM
Node-cores 16 cores Ncore
hugepage size default 2048 kb HPSZ
Compute node count 1232 nodes Cnode
Node hyperthread count 16 threads Nhyper
Node hyperthread count possible 32 threads Maxnhyper
login node count 8 nodes Clogin
gateway node count or list 48 nodes Cgateway
admin node count or list 8 nodes Cadmin
Kernel-tick 100 jiffies/sec JIFSEC
Lustre-collective 21474836480 byte/sec LWRITEBW
infiniband device list qib0.1; device IBLIST
opa device list device OPALIST
eth device list lo;eth0;eth2; device ETHLIST
ipoib device list ib0; device IpoibLIST
lustre device list llite.fscratch;llite.gscratch; device LUSTRELIST

Constants clone-chama;chama-admin table
category DESCRIPTION VALUE UNITS NAME
Rate Limits Link 10Gb 1073741824 byte/sec IB10G
Link-1Gb 134217728 byte/sec ETH1G
Link-40Gb 4294967296 byte/sec IB40G
Link-20Gb 2147483648 byte/sec IB20G
0 none
Unsigned limits 16 uint4_max
256 uint8_max
65536 uint16_max
4294967296 uint32_max
18446744073709551616 uint64_max
1 TRUE
0 FALSE
Conversions kilo 1024 CFKILO
mega 1048576 CFMEGA
giga 1073741824 CFGIGA
tera 1099511627776 CFTERA
HW Limits Node-memory 67108864 kb HWMEM
Node-cores 16 cores Ncore
hugepage size default 2048 kb HPSZ
Compute node count 1232 nodes Cnode
Node hyperthread count 16 threads Nhyper
Node hyperthread count possible 32 threads Maxnhyper
login node count 8 nodes Clogin
gateway node count or list 48 nodes Cgateway
admin node count or list 8 nodes Cadmin
Kernel-tick 100 jiffies/sec JIFSEC
Lustre-collective 21474836480 byte/sec LWRITEBW
infiniband device list qib0.1; device IBLIST
opa device list device OPALIST
eth device list lo;eth0;eth2; device ETHLIST
ipoib device list ib0; device IpoibLIST
lustre device list  ; device LUSTRELIST
Constants clone-chama;chama-gateway table
category DESCRIPTION VALUE UNITS NAME
Rate Limits Link 10Gb 1073741824 byte/sec IB10G
Link-1Gb 134217728 byte/sec ETH1G
Link-40Gb 4294967296 byte/sec IB40G
Link-20Gb 2147483648 byte/sec IB20G
0 none
Unsigned limits 16 uint4_max
256 uint8_max
65536 uint16_max
4294967296 uint32_max
18446744073709551616 uint64_max
1 TRUE
0 FALSE
Conversions kilo 1024 CFKILO
mega 1048576 CFMEGA
giga 1073741824 CFGIGA
tera 1099511627776 CFTERA
HW Limits Node-memory 67108864 kb HWMEM
Node-cores 16 cores Ncore
hugepage size default 2048 kb HPSZ
Compute node count 1232 nodes Cnode
Node hyperthread count 16 threads Nhyper
Node hyperthread count possible 32 threads Maxnhyper
login node count 8 nodes Clogin
gateway node count or list 48 nodes Cgateway
admin node count or list 8 nodes Cadmin
Kernel-tick 100 jiffies/sec JIFSEC
Lustre-collective 21474836480 byte/sec LWRITEBW
infiniband device list qib0.1;mlx4_0.1; device IBLIST
opa device list device OPALIST
eth device list lo;eth0;eth2; device ETHLIST
ipoib device list ib0; device IpoibLIST
lustre device list  ; device LUSTRELIST

Metric table headings

Metric column definitions in detail
Sampler Name description wildcard Mode Units Dimension Use Rate Only Delta/Job Useful Local rate UB UB Local max by dt itype alarm ratealarm collmax plotmax typical notes: kind wildrange
Name of the sampler collecting the metric Metric name and csv column heading Metric definition True if the metric name listed is expanded by appending a wildcard value The metric's behavior: integral is an increasing counter (which may be reset or get stuck at some upper bound), capacity is a fluctuating number, also called a gauge in some systems, and label is a symbolic identifer (numeric or string) The unit of the raw reported value The underlying type of the unit, which identifies the reasonable unit conversions A hint (if true) that showing the raw value to humans is typically unhelpful. A hint (if true) that showing humans the per-job integral of the rate value derived from the metric data may be helpful. The upper limit expected on the derivative (delta_value/delta_time_in_seconds) value from a single producer. This is usually a hardware constraint. The upper limit expected on the value from a single producer. This is usually a hardware constraint, but in some cases (e.g. InfiniBand) is an API constraint The hard upper limit on the local time derivative of the metric value. Higher rate values must be analytical artifacts. 0 as the upper limit means the rate is unlimited. The behavior of integral mode metrics at their upper bound. If 'rolls', then the counter resets to zero and continues counting. If 'infinite', then the counter is expected to never reach its upper limit. If 'saturates', then the counter will stop incrementing at some upper limit and cease to be valid until reset by some external agent. The itype is 1 or the empty string for non-integral metrics. Threshold which if exceeded should cause notification; this is a local policy-defined value. -1 means threshold testing should not be done. Threshold on the time derivative of the metric which if exceeded should cause notification; this is a local policy-defined value. -1 means threshold testing should not be done. The limit on the collective sum of the metric across all producers. The upper bound to use on the metric value when plotting, if deriving the upper bound from the actual data is not applied. (handy for streaming plots) The expected metric value, if any reasonable expectation is possible. (handy for scaling data in some analysis algorithms) Info for humans Data type of the metric, e.g. uint64 is an unsigned 64 bit number. List of substitution values for expanding the names of wildcard=true metrics. The list content may be defined by a value from another constants table.