Difference between revisions of "Main Page"

From OVISWiki
Jump to: navigation, search
Line 1: Line 1:
__NOEDITSECTION__
+
<strong>MediaWiki has been installed.</strong>
__NOTOC__
 
  
OVIS is a modular system for HPC data collection, transport, storage, analysis, visualization, and response. The OVIS project seeks to enable more effective use of High Performance Computational Clusters via greater understanding of applications' use of resources, including the effects of competition for shared resources; discovery of abnormal system conditions; and intelligent response to conditions of interest.
+
Consult the [https://meta.wikimedia.org/wiki/Help:Contents User's Guide] for information on using the wiki software.
  
=== Data Collection, Transport, and Storage ===
+
== Getting started ==
<font color="green">The Lightweight Distributed Metric Service ([[FAQ Public|LDMS]])</font> is the OVIS data collection and transport system. LDMS provides capabilities for lightweight run-time collection of high-fidelity data. Data can be accessed on-node or transported off node. Additionally, LDMS can store data in a variety of storage options.
+
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Configuration_settings Configuration settings list]
* <i><font color="green">[http://www.rd100conference.com/awards/winners-finalists/5494/a-platform-snapshot/ LDMS wins 2015 R&D 100 award!]</font></i>
+
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:FAQ MediaWiki FAQ]
<!--[http://www.rd100awards.com/2015-rd-100-award-winners-->
+
* [https://lists.wikimedia.org/mailman/listinfo/mediawiki-announce MediaWiki release mailing list]
<!-- * <i>[https://www.youtube.com/watch?v=cRWj_7EfoK4 LDMS video]</i> -->
+
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Localisation#Translation_resources Localise MediaWiki for your language]
 
+
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Combating_spam Learn how to combat spam on your wiki]
=== Analysis and Visualization ===
 
 
 
[[Image:Screenshot.png||thumb|500px|OVIS 3.2 screenshot]]
 
 
 
OVIS includes 2 and 3D visual displays of deterministic information about state variables
 
(e.g., temperature, CPU Utilization, fan speed), user-generated derived variables
 
(e.g., aggregated memory errors over the life span of a job), and their aggregate statistics.
 
Visual consideration of the cluster as a compartive ensemble, rather than singleton nodes,
 
is a convenient and useful method for tuning cluster set-up and determining the effects
 
of real-time changes in the cluster configuration and its environment.
 
 
 
[[Image:WebScreenshot.png||thumb|450px|OVIS WebGUI screenshot]]
 
 
 
OVIS includes a variety of statistical tools to dynamically infer models for the
 
normal behavior of a system and to determine bounds on the probability of values evinced
 
in the system. OVIS stores data in distributed database to provide scalability and fault
 
tolerance. Statistical analyses are then performed in a distributed parallel fashion.
 
 
 
OVIS includes prototype capabilties for job log searching that can be used to search
 
for events of interest. The OVIS interface has been designed to be highly interactive,
 
where, for example, selection of a job of interest automatically populates an analysis
 
pane with information relevant to that job, and dropping a job onto the 3D display highlights
 
system values on only those components participating in the job.
 
 
 
=== Log Message Analysis ===
 
<!-- OVIS includes prototype capabilities for log message searching. Additionally, OVIS analyses include the [[Baler_public|Baler]] tool for log message clustering.-->
 
OVIS includes prototype capabilities for log message searching. Additionally, OVIS analyses include the Baler tool for log message clustering.
 
 
=== Decision Support ===
 
The OVIS project includes [[Publications_and_presentations| research work]] in determining intelligent response to conditions of interest. This includes dynamic application (re-)mapping based upon application needs and resource state and invocation of resiliency responses upon discovery of potential pre-failure and/or abnormal conditions.
 
 
 
== Collaborative Analysis Support ==
 
 
 
[[Shaun]], a cluster supporting collaboration in HPC data analytics, is coming soon.
 
 
 
<!--
 
<font color="blue" size="+1">The OVIS project addresses scalable, real-time statistical analysis of very large data sets.</font>
 
<font color="blue" size="+1"> We feature the OVIS tool for Intelligent, Scalable, Real-time Monitoring of Large Computational Clusters and the Baler tool for Lossless, Deterministic Log Message Clustering.</font>
 
* <font color="green" size="+1">[[Downloads_and_documentation|Download OVIS here -- NEW! Git repository access]] </font>
 
* <font color="green" size="+1">[[Baler_public|Baler home page]] </font>
 
<br><br>
 
 
 
 
 
 
 
 
 
== OVIS in HPC ==
 
 
 
In the area of high-performance computing, the long-term goal of OVIS is to enable efficient and reliable computational clusters. We envision a system-wide integration of resource managers (e.g., scheduler), applications, and system resource analysis capabilities. Run-time information on resource utilization and predictive capabilities for anticipated resource needs and component failure can be used by schedulers and applications in order to better allocate resources. For example, information on reliable (or unreliable) system components can be used by the scheduler in making job allocation assignments and further used by applications in order to invoke fault-tolerance mechanisms.
 
 
 
The OVIS tool for Intelligent Scalable Real-Time Monitoring for Large Computational Clusters was created to address the piece of this goal involving resource analysis and failure prediction.
 
 
 
=== OVIS: A Tool for Intelligent, Scalable, Real-Time Monitoring of Large Computational Clusters ===
 
 
 
Traditional cluster monitoring approaches consider nodes in singleton, using manufacturer-specified extreme limits as thresholds to avoid failure. The OVIS tool for monitoring and analysis of large computational platforms, instead, uses a statistical approach. Leveraging the fact that a cluster is comprised of a large number of similar components, OVIS statistically characterizes the behaviors of single components in the context of the behaviors of the entire set of components. Abnormal or outlier behaviors can be much earlier indicators of problems than threshold-crossing.
 
 
 
== OVIS 3 ==
 
 
 
[[Image:Screenshot.png|left|thumb|500px|OVIS 3.2 screenshot]]
 
 
 
OVIS 3 includes a 3D visual display of deterministic information about state variables
 
(e.g., temperature, CPU Utilization, fan speed), user-generated derived variables
 
(e.g., aggregated memory errors over the life span of a job), and their aggregate statistics.
 
Visual consideration of the cluster as a compartive ensemble, rather than singleton nodes,
 
is a convenient and useful method for tuning cluster set-up and determining the effects
 
of real-time changes in the cluster configuration and its environment.
 
 
 
OVIS 3 includes a variety of statistical tools to dynamically infer models for the
 
normal behavior of a system and to determine bounds on the probability of values evinced
 
in the system. OVIS stores data in distributed database to provide scalability and fault
 
tolerance. Statistical analyses are then performed in a distributed parallel fashion.
 
 
 
OVIS 3 includes prototype capabilties for job log searching that can be used to search
 
for events of interest. The OVIS interface has been designed to be highly interactive,
 
where, for example, selection of a job of interest automatically populates an analysis
 
pane with information relevant to that job, and dropping a job onto the 3D display highlights
 
system values on only those components participating in the job.
 
 
 
OVIS has been used in [[publications_and_presentations| research work]] for runtime invocation of system response to analytically
 
discovered conditions of interest.
 
 
 
OVIS 3 is currently available for [[downloads_and_documentation| download]].
 
 
 
 
 
[[Image:OvisInterface.png|left|thumb|500px|OVIS 2.0 screenshot]]
 
 
 
<br><br>
 
== Beyond HPC ==
 
 
 
The OVIS project extends its techniques in large-scale data exploration and statistical data analysis to areas where statistical techniques and scalable data-handling and analysis are required. This includes areas where multiples of components and/or multiples of comparable datasets are appropriate. OVIS has been investigating application to the areas of chemical sensor analysis and large-scale network analysis.
 
 
-->
 
 
 
== Acknowledgements ==
 
OVIS is a project of [http://www.sandia.gov/ Sandia National Laboratories], Albuquerque NM, 87123
 
and collaborative partner [http://www.opengridcomputing.com/ Open Grid Computing], Austin TX.
 
 
 
 
 
SAND 2006-2519W
 

Revision as of 13:24, 13 November 2017

MediaWiki has been installed.

Consult the User's Guide for information on using the wiki software.

Getting started