X hits on this document

Powerpoint document

History of HPC in IBM and Challenge of Exascale - page 14 / 15

30 views

0 shares

0 downloads

0 comments

14 / 15

© 2005 IBM Corporation

Page 14

Extrapolating an Exaflop in 2018 Standard technology scaling will not get us there in 2018

BlueGene/L (2005)

Exaflop

Directly scaled

Exaflop

compromise using evolutionary technology

Assumption for “compromise guess”

Node Peak Perf

5.6GF

20TF

 20TF

Same node count (64k)

hardware concurrency/node

2

8000

 1600

Assume 3.5GHz

System  Power in Compute Chip

1 MW

3.5 GW

25 MW

Expected based on 30W for 200 GF with 6x technology improvement through 4 technology generations. (Only compute chip power scaling, I/Os also scaled same way)

Link Bandwidth (Each unidirectional 3-D link)

1.4Gbps

5 Tbps

 1 Tbps

Not possible to maintain bandwidth ratio.

Wires per unidirectional 3-D link

 2

400  wires

80 wires

Large wire count will eliminate high density and drive links onto cables where they are 100x more expensive. Assume 20 Gbps signaling

Pins in network on node

 24 pins

5,000 pins

 1,000 pins

20 Gbps differential assumed. 20 Gbps over copper will be limited to 12 inches. Will need optics for in rack interconnects.

10Gbps now possible in both copper and optics.

Power in network

100 KW

20 MW

 4 MW

10 mW/Gbps assumed.

Now: 25 mW/Gbps for long distance (greater than 2 feet on copper) for both ends one direction. 45mW/Gbps optics both ends one direction.  + 15mW/Gbps of electrical

Electrical power in future: separately optimized links for power.

Memory Bandwidth/node

5.6GB/s

20TB/s

 1 TB/s

Not possible to maintain external bandwidth/Flop

L2 cache/node

4 MB

16 GB

 500 MB

About 6-7 technology generations with expected eDRAM density improvements

Data pins associated with memory/node

 128 data pins

40,000 pins

 2000 pins

3.2 Gbps per pin

Power in memory I/O (not DRAM)

 12.8 KW

80 MW

 4 MW

10 mW/Gbps assumed. Most current power in address bus.

Future probably about 15mW/Gbps maybe get to 10mW/Gbps  (2.5mW/Gbps is c*v^2*f for random data on data pins) Address power is higher.

QCD CG single iteration time

2.3 msec

11 usec

 15 usec

 Requires:

1)

fast global sum (2 per iteration)

2)

hardware offload for messaging (Driverless messaging)

~1/20 B/Flop bandwidth

Power and packaging driven

~0.02  B/s:F/s

Power, cost and packaging driven

Document info
Document views30
Page views30
Page last viewedSun Dec 04 15:38:29 UTC 2016
Pages15
Paragraphs187
Words1752

Comments