Rationale In might be desireable in later versions of the specifi- cation to add functionality to control the recovery, communicator, message and collective modes. Once again, the current specifica- tion tries to avoid the introduction of new functions to the largest possible extent.
New return code for MPI Init
To give an MPI process the possibility to discover, whether it is a replace- ment for another process (e.g. in case of the communicator mode FTMPI- COMM MODE REBUILD), MPI Init returns on these processes instead of MPI SUCCESS the new return code MPI INIT RESTARTED NODE;
to avoid the usage of this new same information using a static
tion (for the new processes) processes) respectively.
Fault tolerance and error handlers
For discussion One of the major features of MPI is its ability to write li- braries independently of certain applications. One of the key aspects in the specification of FT-MPI is to give library developers the possibility to write fault-tolerant libraries independent without having a specific application in mind.
MPI supports the concept of error-handlers. An application can register a function as an error-handler, which is than called in case an error occurs with the communicator, to which the error handler has been attached to. While the concept of error handlers is very convinient in MPI-1 and MPI-2, it is just partially convinient to write fault tolerant libraries. It’s major drawback is, that just one error handler can be registered at a time.
Imagine a simple example: an application generates a subcommunicator of MPI COMM WORLD to perform certain operations. This subset of pro- cesses is using a library, which makes a duplicate of the subcommunicator to avoid interfering with the application messages. In case a process failes, all subcommunicators are ’destroyed’, thus the user might want to write an error handler, which regenerates its subcommunicator ’automatically’