Very light secure app monitoring approach
We have a variety of strategies to monitor applications with LDMS in development. Each addresses a subset of use cases, and none of the current ones is conservative enough (or easily configured to be conservative enough) for many app users, developers or administrators. I started a github wiki page cataloging the approaches and hope others fill in any gaps it may have. https://github.com/ovis-hpc/ovis/wiki/Proposal-4
A new approach: For some Sandia apps and admins we want/need job monitoring with the following properties list. I propose that this requires new (relatively simple) sampler development and lightweight application interface library development.
- Provides users/admins/developers/analysts with clues about application state in a format which is human and machine readable.
- Low frequency: expected sampling interval of the order of minutes, not seconds.
- Low local sampling overhead: either the data is small in parse-time terms, or the sampler does not even parse it (defer parse to store or analysis).
- Application users/developers do NOT use code that connects to the system ldmsd (which runs as root) or to any other network database.
- Adding "app instrumentation" in any form does not introduce a dependency on network file systems without explicit opt-in by the user at runtime.
- App users/developers do not depend on using binary shared memory constructs -- any communication with an ldmsd sampler is via ascii text file that is also useful to human admins and users.
- No use of LD_PRELOAD tricks or just-in-time binary instrumentation.
- App developers can write simple calls to an ldms-provided file api that manages one or more size-limited metrics files. Definition via API of the file content must be distributable (and logically incremental) throughout the app code.
- App logic can choose to emit app configuration information.
- App logic can choose to emit app progress information as a set of counters (not a stream of events). The set can evolve in content as the code runs.
- App logic may only data emit on an arbitrary subset of the run nodes, or maybe even on the launch node (which in principle may not be where the app actually runs in parallel).
- "App instrumentation" might be implemented as a separate code which, given the arguments of the soon to be launched real application, parses them and emits data before the real app runs.
- Compatible with multiple jobs on the same compute node.
- Not tied to a specific system resource manager in any way.
- An admin-configured ldmsd sampler can automatically discover user data in canonical locations.
Data desired (though maybe not all collected by this method): Much of this is contemplated in ongoing university work. Italic bits are not.
- job id of predecessor job (if a continuation of simulation time)
- all potentially relevant parameters from input file(s)
- a hint about how to automatically detect if the job is not progressing
- any user-supplied tags
- Job id
- path and timestamp of binary used
- command line options present
- environment variables present (perhaps filtered by whitelist/blacklist)
- what libraries are loaded, if binary is not statically linked and stripped.
Seemingly satisfactory implementation method:
- Provide C library and scripting (python and/or binary programs) to incrementally create and update structured text files in /dev/shm/jobmon/$JOBID/
- Format the text files as TOML.
- Use a lock file for consistency (which pretty much forces use of a memory based file system).
What a C/python api might look like (pseudocode) Functions will be provided to populate and update files such as
/dev/shm/jobmon/$JOBID.config /dev/shm/jobmon/$JOBID.progress /dev/shm/jobmon/$JOBID.env
which are in a defined format. Here $JOBID is an application string which should start with SLURM assigned job id and may then include rank or other app-defined key material. An ldmsd sampler can then scan /dev/shm/jobmon/ for new files and collect data safely.
The library could follow a singleton pattern (global) or support thread-level or object level use. For now assume singleton ldms_watch_ and app is responsible for calling only in rank 0 or equivalent. More object-oriented usage, arrays, and per-rank usage should be obvious extensions.
void ldms_watch_init($JOBID, app_name, app_family) void ldms_watch_continuation(previous_job) // collect configuration strings void ldms_watch_config_init(max_data_size) void ldms_watch_config_add_value(key, value) void ldms_watch_config_add_kvlist(map) void ldms_watch_config_add_group(group_name) void ldms_watch_config_add_group_value(group_name, key, value) void ldms_watch_config_write() // collect progress (rolling unsigned iteration counters) void ldms_watch_progress_init(max_data_size) point = ldms_watch_progress_add("point_name") ldms_watch_progress_update(point) ldms_watch_progress_write() // log significant environment variables void ldms_watch_env(envvar) void ldms_write_env()