Difference between revisions of "OVISWiki:Current events"
From OVISWiki
(→News) |
|||
Line 1: | Line 1: | ||
== News == | == News == | ||
− | * We are working on a lightweight [very light secure app monitoring approach] | + | * We are working on a lightweight [[very light secure app monitoring approach]] |
* User’s group with bi-weekly telecons. Face-to-face meeting planned for fall 2019 -- <font color="green"><i>Join Now!</i></font> | * User’s group with bi-weekly telecons. Face-to-face meeting planned for fall 2019 -- <font color="green"><i>Join Now!</i></font> | ||
** Telecon notes and call in info at [https://github.com/ovis-hpc/ovis/wiki github-wiki] | ** Telecon notes and call in info at [https://github.com/ovis-hpc/ovis/wiki github-wiki] |
Revision as of 10:32, 3 December 2019
News
- We are working on a lightweight very light secure app monitoring approach
- User’s group with bi-weekly telecons. Face-to-face meeting planned for fall 2019 -- Join Now!
- Telecon notes and call in info at github-wiki
- Sandia-UIUC collaboration on AI for Supercomputer Diagnostics
- 2017 ISC High Performance 2017 (ISC) Gauss Award Winner: Diagnosing Performance Variations in HPC Applications Using Machine Learning - using LDMS monitoring data as the basis for Machine Learning-based Performance Diagnosis
- LDMS wins 2015 R&D 100 award! LDMS Video
- 2015: ASCR awarded Resilience project Holistic Measurement Driven Resilience: Combining Operational Fault and Failure Measurements and Fault Injection for Quantifying Fault Detection and Impact
Releases
OVIS/LDMS can be obtained from github.com/ovis-hpc
- LDMS v4! Available at github site!
- The current distribution includes only the OVIS/LDMS monitoring, transport, and storage components.
Upcoming HPC Monitoring and Analysis Conference Events
- Workshop on Monitoring and Analysis for HPC Systems Plus Applications (HPCMASPA) held in conjunction with IEEE Cluster 2019 in Sept 2019 at Albuquerque, NM USA.
- Monitoring Large-Scale HPC Systems -- collaboration and resource site for HPC Monitoring
- Includes materials from SC18 BoF: Monitoring Large-Scale HPC Systems: Extracting and Presenting Meaningful System and Application Insights