X hits on this document

PDF document

1.2 Current Parallel Programming Paradigms - page 30 / 33

72 views

0 shares

0 downloads

0 comments

30 / 33

1 2

3

4

5

6

7

8 9

1

11

12

13

14

15

16

17

18

19

2

21

22

23

24

25

26

27

MISCELLANY

21

Rationale In might be desireable in later versions of the specifi- cation to add functionality to control the recovery, communicator, message and collective modes. Once again, the current specifica- tion tries to avoid the introduction of new functions to the largest possible extent.

6.2

New return code for MPI Init

To give an MPI process the possibility to discover, whether it is a replace- ment for another process (e.g. in case of the communicator mode FTMPI- COMM MODE REBUILD), MPI Init returns on these processes instead of MPI SUCCESS the new return code MPI INIT RESTARTED NODE;

Advice

to

constant,

users

If

users

want

they

can

retrieve

the

to avoid the usage of this new same information using a static

constant

and

executing

an

allgather

operation

after

tion (for the new processes) processes) respectively.

and

after

recovery

(for

the

initializa-

the

surviving

6.3

Fault tolerance and error handlers

For discussion One of the major features of MPI is its ability to write li- braries independently of certain applications. One of the key aspects in the specification of FT-MPI is to give library developers the possibility to write fault-tolerant libraries independent without having a specific application in mind.

MPI supports the concept of error-handlers. An application can register a function as an error-handler, which is than called in case an error occurs with the communicator, to which the error handler has been attached to. While the concept of error handlers is very convinient in MPI-1 and MPI-2, it is just partially convinient to write fault tolerant libraries. It’s major drawback is, that just one error handler can be registered at a time.

Imagine a simple example: an application generates a subcommunicator of MPI COMM WORLD to perform certain operations. This subset of pro- cesses is using a library, which makes a duplicate of the subcommunicator to avoid interfering with the application messages. In case a process failes, all subcommunicators are ’destroyed’, thus the user might want to write an error handler, which regenerates its subcommunicator ’automatically’

Document info
Document views72
Page views72
Page last viewedSun Dec 04 11:19:27 UTC 2016
Pages33
Paragraphs1047
Words8761

Comments