BASIC FAULT-TOLERANCE ISSUES
The FT-MPI specification gives answers to the following questions re- lated to the recovery process:
1. What are the required steps and/or options to start the recovery proce- dure once the processes are in the ’FAILURE RECOGNIZED’ status?
What is the status of MPI objects and processes after recovery?
What is the status of ongoing communication (point-to-point com- munication as well as collective operations) after recovering from a failure?
The first question is handled by the recovery mode (FTMPI RECOVERY- MODE), the second by the communicator mode(FTMPI COMM MODE) and the third by the message mode(FTMPI MSG MODE) respectiveley the collective communication mode (FTMPI COLL MODE).