Difference between revisions of "Main Page"

From OVISWiki
Jump to: navigation, search
 
 
(8 intermediate revisions by 2 users not shown)
Line 1: Line 1:
<strong>MediaWiki has been installed.</strong>
+
__NOEDITSECTION__
 +
__NOTOC__
  
Consult the [https://meta.wikimedia.org/wiki/Help:Contents User's Guide] for information on using the wiki software.
+
OVIS is a modular system for HPC data collection, transport, storage, analysis, visualization, and response. The OVIS project seeks to enable more effective use of High Performance Computational Clusters via greater understanding of applications' use of resources, including the effects of competition for shared resources; discovery of abnormal system conditions; and intelligent response to conditions of interest.
  
== Getting started ==
+
=== Data Collection, Transport, and Storage ===
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Configuration_settings Configuration settings list]
+
<font color="green">The Lightweight Distributed Metric Service ([[FAQ Public|LDMS]])</font> is the OVIS data collection and transport system. LDMS provides capabilities for lightweight run-time collection of high-fidelity data. Data can be accessed on-node or transported off node. Additionally, LDMS can store data in a variety of storage options.
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:FAQ MediaWiki FAQ]
+
* <i><font color="green">[http://www.rd100conference.com/awards/winners-finalists/5494/a-platform-snapshot/ LDMS wins 2015 R&D 100 award!]</font></i>
* [https://lists.wikimedia.org/mailman/listinfo/mediawiki-announce MediaWiki release mailing list]
+
<!--[http://www.rd100awards.com/2015-rd-100-award-winners-->
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Localisation#Translation_resources Localise MediaWiki for your language]
+
<!-- * <i>[https://www.youtube.com/watch?v=cRWj_7EfoK4 LDMS video]</i> -->
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Combating_spam Learn how to combat spam on your wiki]
+
 
 +
=== Analysis and Visualization ===
 +
OVIS data can be used for understanding system state and resource utilization.
 +
The [http://github.com/ovis-hpc/ovis current release version] of OVIS enables in transit calculations of functions of metrics at an aggregator before storing or forwarding data to additional consumers. A more flexible analysis and visualization pipeline is in development. 
 +
 
 +
OVIS has been used for investigation of network congestion evolution in large-scale systems.
 +
 
 +
[[Image:BW_Cube_still.png|thumb|300px| Investigation of network congestion evolution on NCSA's Blue Waters Gemini Network (27,648 compute nodes)]]
 +
 
 +
Additional features in development include association of application phases and performance in conjunction with system state data.
 +
 
 +
<!--
 +
[[Image:Screenshot.png||thumb|500px|OVIS 3.2 screenshot]]
 +
 
 +
OVIS includes 2 and 3D visual displays of deterministic information about state variables
 +
(e.g., temperature, CPU Utilization, fan speed), user-generated derived variables
 +
(e.g., aggregated memory errors over the life span of a job), and their aggregate statistics.
 +
Visual consideration of the cluster as a compartive ensemble, rather than singleton nodes,
 +
is a convenient and useful method for tuning cluster set-up and determining the effects
 +
of real-time changes in the cluster configuration and its environment.
 +
 
 +
[[Image:WebScreenshot.png||thumb|450px|OVIS WebGUI screenshot]]
 +
 
 +
OVIS includes a variety of statistical tools to dynamically infer models for the
 +
normal behavior of a system and to determine bounds on the probability of values evinced
 +
in the system. OVIS stores data in distributed database to provide scalability and fault
 +
tolerance. Statistical analyses are then performed in a distributed parallel fashion.
 +
 
 +
OVIS includes prototype capabilties for job log searching that can be used to search
 +
for events of interest. The OVIS interface has been designed to be highly interactive,
 +
where, for example, selection of a job of interest automatically populates an analysis
 +
pane with information relevant to that job, and dropping a job onto the 3D display highlights
 +
system values on only those components participating in the job.
 +
 
 +
-->
 +
 
 +
=== Log Message Analysis ===
 +
<!-- OVIS includes prototype capabilities for log message searching. Additionally, OVIS analyses include the [[Baler_public|Baler]] tool for log message clustering.-->
 +
OVIS analyses include the Baler tool for log message clustering.
 +
 +
=== Decision Support ===
 +
The OVIS project includes [[Publications_and_presentations| research work]] in determining intelligent response to conditions of interest. This includes dynamic application (re-)mapping based upon application needs and resource state and invocation of resiliency responses upon discovery of potential pre-failure and/or abnormal conditions.
 +
 
 +
== Collaborative Analysis Support ==
 +
 
 +
[[Shaun]], a cluster supporting collaboration in HPC data analytics, is coming soon.
 +
 
 +
<!--
 +
<font color="blue" size="+1">The OVIS project addresses scalable, real-time statistical analysis of very large data sets.</font>
 +
<font color="blue" size="+1"> We feature the OVIS tool for Intelligent, Scalable, Real-time Monitoring of Large Computational Clusters and the Baler tool for Lossless, Deterministic Log Message Clustering.</font>
 +
* <font color="green" size="+1">[[Downloads_and_documentation|Download OVIS here -- NEW! Git repository access]] </font>
 +
* <font color="green" size="+1">[[Baler_public|Baler home page]] </font>
 +
<br><br>

Latest revision as of 12:43, 16 February 2018


OVIS is a modular system for HPC data collection, transport, storage, analysis, visualization, and response. The OVIS project seeks to enable more effective use of High Performance Computational Clusters via greater understanding of applications' use of resources, including the effects of competition for shared resources; discovery of abnormal system conditions; and intelligent response to conditions of interest.

Data Collection, Transport, and Storage

The Lightweight Distributed Metric Service (LDMS) is the OVIS data collection and transport system. LDMS provides capabilities for lightweight run-time collection of high-fidelity data. Data can be accessed on-node or transported off node. Additionally, LDMS can store data in a variety of storage options.

Analysis and Visualization

OVIS data can be used for understanding system state and resource utilization. The current release version of OVIS enables in transit calculations of functions of metrics at an aggregator before storing or forwarding data to additional consumers. A more flexible analysis and visualization pipeline is in development.

OVIS has been used for investigation of network congestion evolution in large-scale systems.

Investigation of network congestion evolution on NCSA's Blue Waters Gemini Network (27,648 compute nodes)

Additional features in development include association of application phases and performance in conjunction with system state data.


Log Message Analysis

OVIS analyses include the Baler tool for log message clustering.

Decision Support

The OVIS project includes research work in determining intelligent response to conditions of interest. This includes dynamic application (re-)mapping based upon application needs and resource state and invocation of resiliency responses upon discovery of potential pre-failure and/or abnormal conditions.

Collaborative Analysis Support

Shaun, a cluster supporting collaboration in HPC data analytics, is coming soon.