- 2017 ISC High Performance 2017 (ISC) Gauss Award Winner: Diagnosing Performance Variations in HPC Applications Using Machine Learning - using LDMS monitoring data as the basis for Machine Learning-based Performance Diagnosis
- LDMS wins 2015 R&D 100 award!
- 2015: ASCR awarded Resilience project Holistic Measurement Driven Resilience: Combining Operational Fault and Failure Measurements and Fault Injection for Quantifying Fault Detection and Impact
The current distribution includes only the OVIS/LDMS monitoring, transport, and storage components. OVIS/LDMS can be obtained from github.com/ovis-hpc.
- LDMS Tutorial Materials posted 5/2017.
Upcoming HPC Monitoring and Analysis Conference Events
- Monitoring Large-Scale HPC Systems -- collaboration and resource site for HPC Monitoring
- Monitoring and Analysis for HPC Systems Plus Applications (HPCMASPA) Workshop Series CFP now open for HPCMASPA @ IEEE Cluster 2017