Die Präsentation wird geladen. Bitte warten

Die Präsentation wird geladen. Bitte warten

The Future of Parallel Computing Special Purpose Mesh Architectures Heiko Schröder, 1998 P A R C SA ISA PIPS RM OH.

Ähnliche Präsentationen


Präsentation zum Thema: "The Future of Parallel Computing Special Purpose Mesh Architectures Heiko Schröder, 1998 P A R C SA ISA PIPS RM OH."—  Präsentation transkript:

1 The Future of Parallel Computing Special Purpose Mesh Architectures Heiko Schröder, 1998 P A R C SA ISA PIPS RM OH

2 Heiko Schröder, 1998Slide 2 P A R C Contents Why meshes ??? Application specific parallel mesh architectures Fine grain 1983 Coarse grain 1997 -Systolic Arrays -Instruction Systolic Arrays -PIPS -Reconfigurable mesh -Optical Highway

3 Heiko Schröder, 1998Slide 3 P A R C Physical limits c=300 000 km/sec OPS -- 0.3 mm/OP 1000 PEs with OPS -- 30cm/OP massive parallelism distributed memory

4 Heiko Schröder, 1998Slide 4 P A R C 0.01 0.1 1 10 100 1000 197019751980198519901995 Performance (MIPS) Year 4004 8080 80286 80386 Pentium 2 Pentium 80486 Processor power

5 Heiko Schröder, 1998Slide 5 P A R C 0,5 µ 0,25 µ Scaling Faktor 2: 1/2 width 1/2 hight 1/2 switching time 8 x performance!

6 Heiko Schröder, 1998Slide 6 P A R C CMOS transistors 19601970198019902000201020202030 0,01 0,1 1 10 Size of minimal transistor ca. 0,03

7 Heiko Schröder, 1998Slide 7 P A R C Mesh/Torus diameter bisection width 2D mesh

8 Heiko Schröder, 1998Slide 8 P A R C Hypercube 0-D 0 1 1-D 0 0101 10101 2-D 000000010010 001001011011 100100110110 101101111111 3-D 01 4-D diameter log n bisection width n

9 Heiko Schröder, 1998Slide 9 P A R C VLSI Very Large Scale Integration simple cells few types regular architecture short connections mesh -- torus

10 Heiko Schröder, 1998Slide 10 P A R C Pin limitations 16x16 pins diameter 256 16 pins diameter 16 16x12 pins

11 Heiko Schröder, 1998Slide 11 P A R C Bisection width Bisection width 256 25 cm Bisection width 32K 32 m

12 Heiko Schröder, 1998Slide 12 P A R C Programming SA --- Systolic Array SIMD --- Single Instruction Multiple Data ISA --- Instruction Systolic Array MIMD --- Multiple Instruction Multiple Data

13 Heiko Schröder, 1998Slide 13 P A R C parallel merge initial situation: 1.) sort columns (odd-even-transposition sort) 2.) sort rows (odd-even-transposition sort) sorted !!!! x1x2 x3 x4 x5 x6 x7 x17 x18 y1 y2 y3 y4 y5 y6 y7 y17 y18...

14 Heiko Schröder, 1998Slide 14 P A R C 0-1 principle The 0-1 principle states that if all sequences of 0 and 1 are sorted properly than this is a correct sorter. The sorter must be based on moving data. initially 0s 1s after vertical sort 0s 1s after horizontal sort 0s 1s

15 Heiko Schröder, 1998Slide 15 P A R C MIMD-mesh (clocked) min maxTime: 2n

16 Heiko Schröder, 1998Slide 16 P A R C systolic merge 1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2

17 Heiko Schröder, 1998Slide 17 P A R C systolic merge 1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2

18 Heiko Schröder, 1998Slide 18 P A R C systolic merge 1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2

19 Heiko Schröder, 1998Slide 19 P A R C systolic merge 1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2

20 Heiko Schröder, 1998Slide 20 P A R C systolic merge 1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2 1 3 3 4 5 5 6 7 4 4 3 2 9 8 8 7

21 Heiko Schröder, 1998Slide 21 P A R C systolic merge 1 3 3 4 5 5 6 7 4 4 3 2 9 8 8 7

22 Heiko Schröder, 1998Slide 22 P A R C systolic merge 1 3 3 4 4 4 3 2 5 5 6 7 9 8 8 7

23 Heiko Schröder, 1998Slide 23 P A R C systolic merge 1 3 3 4 4 4 3 2 5 5 6 7 9 8 8 7

24 Heiko Schröder, 1998Slide 24 P A R C systolic merge 1 3 3 2 4 4 3 4 5 5 6 7 9 8 8 7

25 Heiko Schröder, 1998Slide 25 P A R C systolic merge 1 3 3 2 4 4 3 4 5 5 6 7 9 8 8 7

26 Heiko Schröder, 1998Slide 26 P A R C systolic merge 1 3 3 2 4 4 3 4 5 5 6 7 9 8 8 7

27 Heiko Schröder, 1998Slide 27 P A R C systolic merge 1 3 2 3 4 3 4 4 5 5 6 7 9 8 8 7

28 Heiko Schröder, 1998Slide 28 P A R C 1 3 2 3 4 3 4 4 5 5 6 7 9 8 8 7 systolic merge

29 Heiko Schröder, 1998Slide 29 P A R C 1 2 3 3 3 4 4 4 5 5 6 7 9 8 8 7 systolic merge

30 Heiko Schröder, 1998Slide 30 P A R C 1 2 3 3 3 4 4 4 5 5 6 7 9 8 8 7 systolic merge

31 Heiko Schröder, 1998Slide 31 P A R C 1 2 3 3 3 4 4 4 5 5 6 7 9 8 8 7 systolic merge

32 Heiko Schröder, 1998Slide 32 P A R C 1 2 3 3 3 4 4 4 5 5 6 7 8 9 7 8 systolic merge

33 Heiko Schröder, 1998Slide 33 P A R C 1 2 3 3 3 4 4 4 5 5 6 7 8 9 7 8 systolic merge

34 Heiko Schröder, 1998Slide 34 P A R C 1 2 3 3 3 4 4 4 5 5 6 7 8 7 9 8 systolic merge

35 Heiko Schröder, 1998Slide 35 P A R C systolic merge 1 2 3 3 3 4 4 4 5 5 6 7 8 7 9 8

36 Heiko Schröder, 1998Slide 36 P A R C sorted !!! systolic merge 1 2 3 3 3 4 4 4 5 5 6 7 7 8 8 9

37 Heiko Schröder, 1998Slide 37 P A R C Characteristics of SAs Extremely high cost-performance no flexibility -- long development time Suitable for special signal processing tasks ???

38 Heiko Schröder, 1998Slide 38 P A R C Systolic architectures I

39 Heiko Schröder, 1998Slide 39 P A R C Systolic architectures II

40 Heiko Schröder, 1998Slide 40 P A R C ISA merge 1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2 C:=min{C, CE} C:=max{C, CW}

41 Heiko Schröder, 1998Slide 41 P A R C ISA merge 1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2

42 Heiko Schröder, 1998Slide 42 P A R C ISA merge 1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2

43 Heiko Schröder, 1998Slide 43 P A R C ISA merge 1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2

44 Heiko Schröder, 1998Slide 44 P A R C ISA merge 1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2

45 Heiko Schröder, 1998Slide 45 P A R C ISA merge 1 3 3 4 5 5 6 7 4 8 8 7 9 4 3 2

46 Heiko Schröder, 1998Slide 46 P A R C ISA merge 1 3 3 4 5 5 6 7 4 8 8 7 9 4 3 2

47 Heiko Schröder, 1998Slide 47 P A R C ISA merge 1 3 3 4 4 5 6 7 5 4 8 7 9 8 3 2

48 Heiko Schröder, 1998Slide 48 P A R C ISA merge 1 3 3 4 4 5 6 7 5 4 8 7 9 8 3 2

49 Heiko Schröder, 1998Slide 49 P A R C ISA merge 1 3 3 4 4 4 6 7 5 5 3 7 9 8 8 2

50 Heiko Schröder, 1998Slide 50 P A R C ISA merge 1 3 3 4 4 4 6 7 5 5 3 7 9 8 8 2

51 Heiko Schröder, 1998Slide 51 P A R C ISA merge 1 3 3 4 4 4 3 7 5 5 6 2 9 8 8 7

52 Heiko Schröder, 1998Slide 52 P A R C ISA merge 1 3 3 4 4 4 3 7 5 5 6 2 9 8 8 7

53 Heiko Schröder, 1998Slide 53 P A R C ISA merge 1 3 3 4 4 4 3 2 5 5 6 7 9 8 8 7

54 Heiko Schröder, 1998Slide 54 P A R C ISA merge 1 3 3 4 4 4 3 2 5 5 6 7 9 8 8 7

55 Heiko Schröder, 1998Slide 55 P A R C ISA merge 1 3 3 2 4 4 3 4 5 5 6 7 9 8 8 7

56 Heiko Schröder, 1998Slide 56 P A R C ISA merge 1 3 3 2 4 4 3 4 5 5 6 7 9 8 8 7

57 Heiko Schröder, 1998Slide 57 P A R C ISA merge 1 3 2 3 4 3 4 4 5 5 6 7 9 8 8 7

58 Heiko Schröder, 1998Slide 58 P A R C ISA merge 1 3 2 3 4 3 4 4 5 5 6 7 9 8 8 7

59 Heiko Schröder, 1998Slide 59 P A R C ISA merge 1 2 3 3 3 4 4 4 5 5 6 7 9 8 8 7

60 Heiko Schröder, 1998Slide 60 P A R C ISA merge 1 2 3 3 3 4 4 4 5 5 6 7 9 8 8 7

61 Heiko Schröder, 1998Slide 61 P A R C ISA merge 1 2 3 3 3 4 4 4 5 5 6 7 9 8 8 7

62 Heiko Schröder, 1998Slide 62 P A R C ISA merge 1 2 3 3 3 4 4 4 5 5 6 7 8 9 7 8

63 Heiko Schröder, 1998Slide 63 P A R C ISA merge 1 2 3 3 3 4 4 4 5 5 6 7 8 9 7 8

64 Heiko Schröder, 1998Slide 64 P A R C ISA merge 1 2 3 3 3 4 4 4 5 5 6 7 8 7 9 8

65 Heiko Schröder, 1998Slide 65 P A R C ISA merge 1 2 3 3 3 4 4 4 5 5 6 7 8 7 9 8

66 Heiko Schröder, 1998Slide 66 P A R C ISA merge 1 2 3 3 3 4 4 4 5 5 6 7 7 8 8 9

67 Heiko Schröder, 1998Slide 67 P A R C ISA merge 1 2 3 3 3 4 4 4 5 5 6 7 7 8 8 9

68 Heiko Schröder, 1998Slide 68 P A R C ISA merge 1 2 3 3 3 4 4 4 5 5 6 7 7 8 8 9

69 Heiko Schröder, 1998Slide 69 P A R C ISA merge 1 2 3 3 3 4 4 4 5 5 6 7 7 8 8 9

70 Heiko Schröder, 1998Slide 70 P A R C ISA merge 1 2 3 3 3 4 4 4 5 5 6 7 7 8 8 9

71 Heiko Schröder, 1998Slide 71 P A R C Hough transform on the ISA good line detection method shear Fast tomography

72 Heiko Schröder, 1998Slide 72 P A R C robot vision projector CCD stereo vision

73 Heiko Schröder, 1998Slide 73 P A R C Use of the ISA Areas of application for ISA: automatic optical quality control real time signal processing computer graphics /visualization linear equations Cryptography --> Tele-medicine ? Special features: fast aggregate functions (sum, carry) fast local communication no local memory typical improvement over PC: Factor 20-30

74 Heiko Schröder, 1998Slide 74 P A R C Instruction Systolic Array

75 Heiko Schröder, 1998Slide 75 P A R C PIPS (1990-94) 1 M bit 1 M bit 32x32 torus 16 bit parallel communication 16 bit add prefetch memory control BHP -- CSIRO -- NU -- ADFA 1.4 M

76 Heiko Schröder, 1998Slide 76 P A R C Special features: local memory SIMD-torus memory pre-fetch Applications: visualization 3D-simulation (CFD, FEM)

77 Heiko Schröder, 1998Slide 77 P A R C

78 Heiko Schröder, 1998Slide 78 P A R C PIPS

79 Heiko Schröder, 1998Slide 79 P A R C Use in industry ? 1993 199419951996 500 1000 1500 2000 2500 3000 Performance [Gflops] Research Industry 1327 2121 648 3675 126 248 693 1168

80 Heiko Schröder, 1998Slide 80 P A R C Investments Investments into parallel computers [M$] 0 500 1000 1500 2000 2500 3000 3500 1993199419951996 Research Industry

81 Heiko Schröder, 1998Slide 81 P A R C Concentration 1993 199419951996 10 20 30 40 50 60 Number of manufacturers 11 21 19 49

82 Heiko Schröder, 1998Slide 82 P A R C Degree of Parallelism 0 50 100 150 200 250 300 350 400 450 May-93 Nov-93 May-94 Nov-94 May-95 Nov-95 May-96 Nov-96 May-97 Nov-97 1 to 63 64 to 255 256 to 1023 1024 and more Number of new Systems

83 Heiko Schröder, 1998Slide 83 P A R C Evaluation Cost * computation time Parallel computers with standard components Imbedded parallel systems

84 Heiko Schröder, 1998Slide 84 P A R C reconfigurable mesh reconfigurable mesh = mesh + interior connections 15 positions low cost

85 Heiko Schröder, 1998Slide 85 P A R C global OR and modulo 3 10 000 1 0 ** V log n on EREW-PRAM 101110 * 1 mod 3 log n / log log n on CRCW-PRAM

86 Heiko Schröder, 1998Slide 86 P A R C sorting with all-to-all mapping Sorting: sort blocks all-to-all (columns) sort blocks all-to-all (rows) o-e-sort blocks

87 Heiko Schröder, 1998Slide 87 P A R C all-to-all mapping n x n

88 Heiko Schröder, 1998Slide 88 P A R C vertical all-to-all

89 Heiko Schröder, 1998Slide 89 P A R C horizontal all-to-all

90 Heiko Schröder, 1998Slide 90 P A R C 1 step 2 steps 3 steps k/2 steps 3 steps 2 steps 1 step (k/2) 2 steps

91 Heiko Schröder, 1998Slide 91 P A R C sorting in optimal time (k/2) 2 steps k=n 1/3 each step takes n 1/3 time --> T= n/4 x 2 /2 T = n/2 all-to-all Sorting: sort blocks (O(n 2/3 )) all-to-all (n/2) sort blocks (O(n 2/3 )) all-to-all (n/2) sort blocks (O(n 2/3 )) time: n + o(n)

92 Heiko Schröder, 1998Slide 92 P A R C Reconfigurable mesh Special features SIMD constant diameter faster than PRAM ? Suitable applications routing/sorting/load balancing sparse matrix multiplication segmentation / component labeling feature extraction image database ?

93 Heiko Schröder, 1998Slide 93 P A R C Reconfigurable mesh

94 Heiko Schröder, 1998Slide 94 P A R C Optical Highway W=1; P=100 W=100; P=22 All-to-all connection

95 Heiko Schröder, 1998Slide 95 P A R C Horizontal all-to-all Vertical all-to-all

96 Heiko Schröder, 1998Slide 96 P A R C Features of optically connected meshes SIMD/SPMD/MIMD implement all major architectures all-to-all communication in 2 steps Bulk synchronous processing (BSP) no latency hiding no pin-limitation Applications coarse grain parallel computing only? ray-tracing ? ???

97 Heiko Schröder, 1998Slide 97 P A R C Optical Highway 1. H. Schröder et al, RMB --- A Reconfigurable Multiple Bus Network, HPCA 96, San Jose, 1996 2. H. Schröder, O. Sykora, I. Vrto, Optical All-to-All Communication for some Product Graphs, SOFSEM '97, Milovy, Czech Republic, 1997

98 Heiko Schröder, 1998Slide 98 P A R C Bisection-width / Diameter Diameter log n bisection width n diameter bisection width SA ISA PIPS Diameter 1 bisection width RM Diameter 1 bisection width n OH

99 Heiko Schröder, 1998Slide 99 P A R C Suitable problems ? diameter log n bisection width n SA: suitable applications? ISA: 2D-problems, aggregate functions local communication PIPS: 3D-problems, local communication RM: diameter-bound > bisection-width-bound OH: PRAM equivalent? SA ISA PIPS RM OH

100 ? ? ? ?


Herunterladen ppt "The Future of Parallel Computing Special Purpose Mesh Architectures Heiko Schröder, 1998 P A R C SA ISA PIPS RM OH."

Ähnliche Präsentationen


Google-Anzeigen