X hits on this document

PDF document

1.2 Current Parallel Programming Paradigms - page 17 / 33

81 views

0 shares

0 downloads

0 comments

17 / 33

RECOVERY MODES

8

1

2

3

4

5

6

3. FTMPI RECOVERY MODE IGNORE: in this mode, the recovery procedure does not have to be initiated at all, as long as no communi- cation with the dead processes are required. Communication involving dead processes (point-to-point operations, collective operations as well as communicator creations) will raise an error and will not be executed.

7 8

Rationale This mode has been designed with two things in mind. First, since the recovery procedure is a collective operation, it can be desireable to avoid this collective operation for large numbers of processors (e.g. 100,000). Second, there is a class of applications often refered to as ’naturaly fault-tolerant’ which do not require any special handling on the application level to deal with failed processes.

9

3.1

Pathological failures

1

11

An MPI library can still abort if a pathological failure has occured from which it can not recovery. Typical reasons for pathological failures could be:

12

All processes of an MPI job have failed before a recovery operation could be started.

13

14

The MPI library has no ’room left’ where to respawn processes.

Document info
Document views81
Page views81
Page last viewedWed Dec 07 13:27:32 UTC 2016
Pages33
Paragraphs1047
Words8761

Comments