Präsentation herunterladen
Die Präsentation wird geladen. Bitte warten
Veröffentlicht von:Liesl Weyl Geändert vor über 10 Jahren
1
The Future of Parallel Computing Special Purpose Mesh Architectures Heiko Schröder, 1998 P A R C SA ISA PIPS RM OH
2
Heiko Schröder, 1998Slide 2 P A R C Contents Why meshes ??? Application specific parallel mesh architectures Fine grain 1983 Coarse grain 1997 -Systolic Arrays -Instruction Systolic Arrays -PIPS -Reconfigurable mesh -Optical Highway
3
Heiko Schröder, 1998Slide 3 P A R C Physical limits c=300 000 km/sec OPS -- 0.3 mm/OP 1000 PEs with OPS -- 30cm/OP massive parallelism distributed memory
4
Heiko Schröder, 1998Slide 4 P A R C 0.01 0.1 1 10 100 1000 197019751980198519901995 Performance (MIPS) Year 4004 8080 80286 80386 Pentium 2 Pentium 80486 Processor power
5
Heiko Schröder, 1998Slide 5 P A R C 0,5 µ 0,25 µ Scaling Faktor 2: 1/2 width 1/2 hight 1/2 switching time 8 x performance!
6
Heiko Schröder, 1998Slide 6 P A R C CMOS transistors 19601970198019902000201020202030 0,01 0,1 1 10 Size of minimal transistor ca. 0,03
7
Heiko Schröder, 1998Slide 7 P A R C Mesh/Torus diameter bisection width 2D mesh
8
Heiko Schröder, 1998Slide 8 P A R C Hypercube 0-D 0 1 1-D 0 0101 10101 2-D 000000010010 001001011011 100100110110 101101111111 3-D 01 4-D diameter log n bisection width n
9
Heiko Schröder, 1998Slide 9 P A R C VLSI Very Large Scale Integration simple cells few types regular architecture short connections mesh -- torus
10
Heiko Schröder, 1998Slide 10 P A R C Pin limitations 16x16 pins diameter 256 16 pins diameter 16 16x12 pins
11
Heiko Schröder, 1998Slide 11 P A R C Bisection width Bisection width 256 25 cm Bisection width 32K 32 m
12
Heiko Schröder, 1998Slide 12 P A R C Programming SA --- Systolic Array SIMD --- Single Instruction Multiple Data ISA --- Instruction Systolic Array MIMD --- Multiple Instruction Multiple Data
13
Heiko Schröder, 1998Slide 13 P A R C parallel merge initial situation: 1.) sort columns (odd-even-transposition sort) 2.) sort rows (odd-even-transposition sort) sorted !!!! x1x2 x3 x4 x5 x6 x7 x17 x18 y1 y2 y3 y4 y5 y6 y7 y17 y18...
14
Heiko Schröder, 1998Slide 14 P A R C 0-1 principle The 0-1 principle states that if all sequences of 0 and 1 are sorted properly than this is a correct sorter. The sorter must be based on moving data. initially 0s 1s after vertical sort 0s 1s after horizontal sort 0s 1s
15
Heiko Schröder, 1998Slide 15 P A R C MIMD-mesh (clocked) min maxTime: 2n
16
Heiko Schröder, 1998Slide 16 P A R C systolic merge 1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2
17
Heiko Schröder, 1998Slide 17 P A R C systolic merge 1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2
18
Heiko Schröder, 1998Slide 18 P A R C systolic merge 1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2
19
Heiko Schröder, 1998Slide 19 P A R C systolic merge 1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2
20
Heiko Schröder, 1998Slide 20 P A R C systolic merge 1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2 1 3 3 4 5 5 6 7 4 4 3 2 9 8 8 7
21
Heiko Schröder, 1998Slide 21 P A R C systolic merge 1 3 3 4 5 5 6 7 4 4 3 2 9 8 8 7
22
Heiko Schröder, 1998Slide 22 P A R C systolic merge 1 3 3 4 4 4 3 2 5 5 6 7 9 8 8 7
23
Heiko Schröder, 1998Slide 23 P A R C systolic merge 1 3 3 4 4 4 3 2 5 5 6 7 9 8 8 7
24
Heiko Schröder, 1998Slide 24 P A R C systolic merge 1 3 3 2 4 4 3 4 5 5 6 7 9 8 8 7
25
Heiko Schröder, 1998Slide 25 P A R C systolic merge 1 3 3 2 4 4 3 4 5 5 6 7 9 8 8 7
26
Heiko Schröder, 1998Slide 26 P A R C systolic merge 1 3 3 2 4 4 3 4 5 5 6 7 9 8 8 7
27
Heiko Schröder, 1998Slide 27 P A R C systolic merge 1 3 2 3 4 3 4 4 5 5 6 7 9 8 8 7
28
Heiko Schröder, 1998Slide 28 P A R C 1 3 2 3 4 3 4 4 5 5 6 7 9 8 8 7 systolic merge
29
Heiko Schröder, 1998Slide 29 P A R C 1 2 3 3 3 4 4 4 5 5 6 7 9 8 8 7 systolic merge
30
Heiko Schröder, 1998Slide 30 P A R C 1 2 3 3 3 4 4 4 5 5 6 7 9 8 8 7 systolic merge
31
Heiko Schröder, 1998Slide 31 P A R C 1 2 3 3 3 4 4 4 5 5 6 7 9 8 8 7 systolic merge
32
Heiko Schröder, 1998Slide 32 P A R C 1 2 3 3 3 4 4 4 5 5 6 7 8 9 7 8 systolic merge
33
Heiko Schröder, 1998Slide 33 P A R C 1 2 3 3 3 4 4 4 5 5 6 7 8 9 7 8 systolic merge
34
Heiko Schröder, 1998Slide 34 P A R C 1 2 3 3 3 4 4 4 5 5 6 7 8 7 9 8 systolic merge
35
Heiko Schröder, 1998Slide 35 P A R C systolic merge 1 2 3 3 3 4 4 4 5 5 6 7 8 7 9 8
36
Heiko Schröder, 1998Slide 36 P A R C sorted !!! systolic merge 1 2 3 3 3 4 4 4 5 5 6 7 7 8 8 9
37
Heiko Schröder, 1998Slide 37 P A R C Characteristics of SAs Extremely high cost-performance no flexibility -- long development time Suitable for special signal processing tasks ???
38
Heiko Schröder, 1998Slide 38 P A R C Systolic architectures I
39
Heiko Schröder, 1998Slide 39 P A R C Systolic architectures II
40
Heiko Schröder, 1998Slide 40 P A R C ISA merge 1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2 C:=min{C, CE} C:=max{C, CW}
41
Heiko Schröder, 1998Slide 41 P A R C ISA merge 1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2
42
Heiko Schröder, 1998Slide 42 P A R C ISA merge 1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2
43
Heiko Schröder, 1998Slide 43 P A R C ISA merge 1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2
44
Heiko Schröder, 1998Slide 44 P A R C ISA merge 1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2
45
Heiko Schröder, 1998Slide 45 P A R C ISA merge 1 3 3 4 5 5 6 7 4 8 8 7 9 4 3 2
46
Heiko Schröder, 1998Slide 46 P A R C ISA merge 1 3 3 4 5 5 6 7 4 8 8 7 9 4 3 2
47
Heiko Schröder, 1998Slide 47 P A R C ISA merge 1 3 3 4 4 5 6 7 5 4 8 7 9 8 3 2
48
Heiko Schröder, 1998Slide 48 P A R C ISA merge 1 3 3 4 4 5 6 7 5 4 8 7 9 8 3 2
49
Heiko Schröder, 1998Slide 49 P A R C ISA merge 1 3 3 4 4 4 6 7 5 5 3 7 9 8 8 2
50
Heiko Schröder, 1998Slide 50 P A R C ISA merge 1 3 3 4 4 4 6 7 5 5 3 7 9 8 8 2
51
Heiko Schröder, 1998Slide 51 P A R C ISA merge 1 3 3 4 4 4 3 7 5 5 6 2 9 8 8 7
52
Heiko Schröder, 1998Slide 52 P A R C ISA merge 1 3 3 4 4 4 3 7 5 5 6 2 9 8 8 7
53
Heiko Schröder, 1998Slide 53 P A R C ISA merge 1 3 3 4 4 4 3 2 5 5 6 7 9 8 8 7
54
Heiko Schröder, 1998Slide 54 P A R C ISA merge 1 3 3 4 4 4 3 2 5 5 6 7 9 8 8 7
55
Heiko Schröder, 1998Slide 55 P A R C ISA merge 1 3 3 2 4 4 3 4 5 5 6 7 9 8 8 7
56
Heiko Schröder, 1998Slide 56 P A R C ISA merge 1 3 3 2 4 4 3 4 5 5 6 7 9 8 8 7
57
Heiko Schröder, 1998Slide 57 P A R C ISA merge 1 3 2 3 4 3 4 4 5 5 6 7 9 8 8 7
58
Heiko Schröder, 1998Slide 58 P A R C ISA merge 1 3 2 3 4 3 4 4 5 5 6 7 9 8 8 7
59
Heiko Schröder, 1998Slide 59 P A R C ISA merge 1 2 3 3 3 4 4 4 5 5 6 7 9 8 8 7
60
Heiko Schröder, 1998Slide 60 P A R C ISA merge 1 2 3 3 3 4 4 4 5 5 6 7 9 8 8 7
61
Heiko Schröder, 1998Slide 61 P A R C ISA merge 1 2 3 3 3 4 4 4 5 5 6 7 9 8 8 7
62
Heiko Schröder, 1998Slide 62 P A R C ISA merge 1 2 3 3 3 4 4 4 5 5 6 7 8 9 7 8
63
Heiko Schröder, 1998Slide 63 P A R C ISA merge 1 2 3 3 3 4 4 4 5 5 6 7 8 9 7 8
64
Heiko Schröder, 1998Slide 64 P A R C ISA merge 1 2 3 3 3 4 4 4 5 5 6 7 8 7 9 8
65
Heiko Schröder, 1998Slide 65 P A R C ISA merge 1 2 3 3 3 4 4 4 5 5 6 7 8 7 9 8
66
Heiko Schröder, 1998Slide 66 P A R C ISA merge 1 2 3 3 3 4 4 4 5 5 6 7 7 8 8 9
67
Heiko Schröder, 1998Slide 67 P A R C ISA merge 1 2 3 3 3 4 4 4 5 5 6 7 7 8 8 9
68
Heiko Schröder, 1998Slide 68 P A R C ISA merge 1 2 3 3 3 4 4 4 5 5 6 7 7 8 8 9
69
Heiko Schröder, 1998Slide 69 P A R C ISA merge 1 2 3 3 3 4 4 4 5 5 6 7 7 8 8 9
70
Heiko Schröder, 1998Slide 70 P A R C ISA merge 1 2 3 3 3 4 4 4 5 5 6 7 7 8 8 9
71
Heiko Schröder, 1998Slide 71 P A R C Hough transform on the ISA good line detection method shear Fast tomography
72
Heiko Schröder, 1998Slide 72 P A R C robot vision projector CCD stereo vision
73
Heiko Schröder, 1998Slide 73 P A R C Use of the ISA Areas of application for ISA: automatic optical quality control real time signal processing computer graphics /visualization linear equations Cryptography --> Tele-medicine ? Special features: fast aggregate functions (sum, carry) fast local communication no local memory typical improvement over PC: Factor 20-30
74
Heiko Schröder, 1998Slide 74 P A R C Instruction Systolic Array
75
Heiko Schröder, 1998Slide 75 P A R C PIPS (1990-94) 1 M bit 1 M bit 32x32 torus 16 bit parallel communication 16 bit add prefetch memory control BHP -- CSIRO -- NU -- ADFA 1.4 M
76
Heiko Schröder, 1998Slide 76 P A R C Special features: local memory SIMD-torus memory pre-fetch Applications: visualization 3D-simulation (CFD, FEM)
77
Heiko Schröder, 1998Slide 77 P A R C
78
Heiko Schröder, 1998Slide 78 P A R C PIPS
79
Heiko Schröder, 1998Slide 79 P A R C Use in industry ? 1993 199419951996 500 1000 1500 2000 2500 3000 Performance [Gflops] Research Industry 1327 2121 648 3675 126 248 693 1168
80
Heiko Schröder, 1998Slide 80 P A R C Investments Investments into parallel computers [M$] 0 500 1000 1500 2000 2500 3000 3500 1993199419951996 Research Industry
81
Heiko Schröder, 1998Slide 81 P A R C Concentration 1993 199419951996 10 20 30 40 50 60 Number of manufacturers 11 21 19 49
82
Heiko Schröder, 1998Slide 82 P A R C Degree of Parallelism 0 50 100 150 200 250 300 350 400 450 May-93 Nov-93 May-94 Nov-94 May-95 Nov-95 May-96 Nov-96 May-97 Nov-97 1 to 63 64 to 255 256 to 1023 1024 and more Number of new Systems
83
Heiko Schröder, 1998Slide 83 P A R C Evaluation Cost * computation time Parallel computers with standard components Imbedded parallel systems
84
Heiko Schröder, 1998Slide 84 P A R C reconfigurable mesh reconfigurable mesh = mesh + interior connections 15 positions low cost
85
Heiko Schröder, 1998Slide 85 P A R C global OR and modulo 3 10 000 1 0 ** V log n on EREW-PRAM 101110 * 1 mod 3 log n / log log n on CRCW-PRAM
86
Heiko Schröder, 1998Slide 86 P A R C sorting with all-to-all mapping Sorting: sort blocks all-to-all (columns) sort blocks all-to-all (rows) o-e-sort blocks
87
Heiko Schröder, 1998Slide 87 P A R C all-to-all mapping n x n
88
Heiko Schröder, 1998Slide 88 P A R C vertical all-to-all
89
Heiko Schröder, 1998Slide 89 P A R C horizontal all-to-all
90
Heiko Schröder, 1998Slide 90 P A R C 1 step 2 steps 3 steps k/2 steps 3 steps 2 steps 1 step (k/2) 2 steps
91
Heiko Schröder, 1998Slide 91 P A R C sorting in optimal time (k/2) 2 steps k=n 1/3 each step takes n 1/3 time --> T= n/4 x 2 /2 T = n/2 all-to-all Sorting: sort blocks (O(n 2/3 )) all-to-all (n/2) sort blocks (O(n 2/3 )) all-to-all (n/2) sort blocks (O(n 2/3 )) time: n + o(n)
92
Heiko Schröder, 1998Slide 92 P A R C Reconfigurable mesh Special features SIMD constant diameter faster than PRAM ? Suitable applications routing/sorting/load balancing sparse matrix multiplication segmentation / component labeling feature extraction image database ?
93
Heiko Schröder, 1998Slide 93 P A R C Reconfigurable mesh
94
Heiko Schröder, 1998Slide 94 P A R C Optical Highway W=1; P=100 W=100; P=22 All-to-all connection
95
Heiko Schröder, 1998Slide 95 P A R C Horizontal all-to-all Vertical all-to-all
96
Heiko Schröder, 1998Slide 96 P A R C Features of optically connected meshes SIMD/SPMD/MIMD implement all major architectures all-to-all communication in 2 steps Bulk synchronous processing (BSP) no latency hiding no pin-limitation Applications coarse grain parallel computing only? ray-tracing ? ???
97
Heiko Schröder, 1998Slide 97 P A R C Optical Highway 1. H. Schröder et al, RMB --- A Reconfigurable Multiple Bus Network, HPCA 96, San Jose, 1996 2. H. Schröder, O. Sykora, I. Vrto, Optical All-to-All Communication for some Product Graphs, SOFSEM '97, Milovy, Czech Republic, 1997
98
Heiko Schröder, 1998Slide 98 P A R C Bisection-width / Diameter Diameter log n bisection width n diameter bisection width SA ISA PIPS Diameter 1 bisection width RM Diameter 1 bisection width n OH
99
Heiko Schröder, 1998Slide 99 P A R C Suitable problems ? diameter log n bisection width n SA: suitable applications? ISA: 2D-problems, aggregate functions local communication PIPS: 3D-problems, local communication RM: diameter-bound > bisection-width-bound OH: PRAM equivalent? SA ISA PIPS RM OH
100
? ? ? ?
Ähnliche Präsentationen
© 2024 SlidePlayer.org Inc.
All rights reserved.