second mode completes all ongoing messages after the recovery operation, with the exception of the messages to and from the failed processes. This mode requires, that the application keeps precisely track of the state of each process, minimizing the roll-back procedure. Similar modes are available for collective operations, which can either be executed in an atomic or a non-atomic mode.
An Implementation of the FT-MPI specification
The University of Tennessee implemented the specification presented in the previous section. The current implementation is relying on the HARNESS framework. HARNESS (Heterogeneous Adaptable Reconfigurable Networked SyStems) provides a fault-tolerant, dynamic run-time environment, which is used by FT-MPI for process management and failure notification.
UTK’s implementation of the FT-MPI specification proves, that the specification is not just a theoretical framework, but that it is practically working.
The currently available functionality includes the full MPI-1.2 specification, as well as several sections of the MPI-2 document. Lots of efforts have been furthermore invested in optimizing the collective operations and the derived datatype section of MPI. A multi-protocol device supporting besides TCP/IP also various other protocols (e.g. shared memory, myrinet) is currently in the testing phase.
UTK’s implementation has proven in various benchmarks and application scenarios, that the performance of FT-MPI is comparable to the current state-of-the art MPI libraries as long as no error occurs. This indicates, that the new specification does not harm the performance of applications in case no error occurs. The results furthermore show, that the FT-MPI specification is compatible to the current specifications of MPI, since all current MPI applications work without any modifications with FT-MPI.
Currently ongoing work is also the abstraction of the features needed for implementing the fault- tolerant features into an abstract device interface (FT-ADI). Thus, different run-time environments could be used to implement a specification of FT-MPI, e.g. a simple environment relying on shared files for processing having a common file-system. UTK’s FT-MPI implementation is available for free download at http://icl.cs.utk.edu/ftmpi/.
Simultaneously to the development of the specification and UTKs implementation of FT-MPI, a large set of applications have been tested and benchmarked. Most of these rely on a technique called in-memory checkpoint. This technique avoids writing checkpoint files by distributing additional information based on encoding techniques like the Reed-Solomon Algorithm on other processes. In case an error occurs, the application need not be restarted, but the additional information is used to reconstruct the data of the failed process. Especially for large numbers of processes, this technique improves the performance of the application dramatically compared to writing and reading checkpoint files, since it avoids typically slow file operations. Among the applications using this technique are
A parallel, preconditioned conjugate gradient solver,
A dense matrix multiplication,
QR factorization and