Die Präsentation wird geladen. Bitte warten

Die Präsentation wird geladen. Bitte warten

The Future of Parallel Computing Special Purpose Mesh Architectures Heiko Schröder, 1998 P A R C SA ISA PIPS RM OH.

Ähnliche Präsentationen


Präsentation zum Thema: "The Future of Parallel Computing Special Purpose Mesh Architectures Heiko Schröder, 1998 P A R C SA ISA PIPS RM OH."—  Präsentation transkript:

1 The Future of Parallel Computing Special Purpose Mesh Architectures Heiko Schröder, 1998 P A R C SA ISA PIPS RM OH

2 Heiko Schröder, 1998Slide 2 P A R C Contents Why meshes ??? Application specific parallel mesh architectures Fine grain 1983 Coarse grain Systolic Arrays -Instruction Systolic Arrays -PIPS -Reconfigurable mesh -Optical Highway

3 Heiko Schröder, 1998Slide 3 P A R C Physical limits c= km/sec OPS mm/OP 1000 PEs with OPS -- 30cm/OP massive parallelism distributed memory

4 Heiko Schröder, 1998Slide 4 P A R C Performance (MIPS) Year Pentium 2 Pentium Processor power

5 Heiko Schröder, 1998Slide 5 P A R C 0,5 µ 0,25 µ Scaling Faktor 2: 1/2 width 1/2 hight 1/2 switching time 8 x performance!

6 Heiko Schröder, 1998Slide 6 P A R C CMOS transistors ,01 0, Size of minimal transistor ca. 0,03

7 Heiko Schröder, 1998Slide 7 P A R C Mesh/Torus diameter bisection width 2D mesh

8 Heiko Schröder, 1998Slide 8 P A R C Hypercube 0-D D D D 01 4-D diameter log n bisection width n

9 Heiko Schröder, 1998Slide 9 P A R C VLSI Very Large Scale Integration simple cells few types regular architecture short connections mesh -- torus

10 Heiko Schröder, 1998Slide 10 P A R C Pin limitations 16x16 pins diameter pins diameter 16 16x12 pins

11 Heiko Schröder, 1998Slide 11 P A R C Bisection width Bisection width cm Bisection width 32K 32 m

12 Heiko Schröder, 1998Slide 12 P A R C Programming SA --- Systolic Array SIMD --- Single Instruction Multiple Data ISA --- Instruction Systolic Array MIMD --- Multiple Instruction Multiple Data

13 Heiko Schröder, 1998Slide 13 P A R C parallel merge initial situation: 1.) sort columns (odd-even-transposition sort) 2.) sort rows (odd-even-transposition sort) sorted !!!! x1x2 x3 x4 x5 x6 x7 x17 x18 y1 y2 y3 y4 y5 y6 y7 y17 y18...

14 Heiko Schröder, 1998Slide 14 P A R C 0-1 principle The 0-1 principle states that if all sequences of 0 and 1 are sorted properly than this is a correct sorter. The sorter must be based on moving data. initially 0s 1s after vertical sort 0s 1s after horizontal sort 0s 1s

15 Heiko Schröder, 1998Slide 15 P A R C MIMD-mesh (clocked) min maxTime: 2n

16 Heiko Schröder, 1998Slide 16 P A R C systolic merge

17 Heiko Schröder, 1998Slide 17 P A R C systolic merge

18 Heiko Schröder, 1998Slide 18 P A R C systolic merge

19 Heiko Schröder, 1998Slide 19 P A R C systolic merge

20 Heiko Schröder, 1998Slide 20 P A R C systolic merge

21 Heiko Schröder, 1998Slide 21 P A R C systolic merge

22 Heiko Schröder, 1998Slide 22 P A R C systolic merge

23 Heiko Schröder, 1998Slide 23 P A R C systolic merge

24 Heiko Schröder, 1998Slide 24 P A R C systolic merge

25 Heiko Schröder, 1998Slide 25 P A R C systolic merge

26 Heiko Schröder, 1998Slide 26 P A R C systolic merge

27 Heiko Schröder, 1998Slide 27 P A R C systolic merge

28 Heiko Schröder, 1998Slide 28 P A R C systolic merge

29 Heiko Schröder, 1998Slide 29 P A R C systolic merge

30 Heiko Schröder, 1998Slide 30 P A R C systolic merge

31 Heiko Schröder, 1998Slide 31 P A R C systolic merge

32 Heiko Schröder, 1998Slide 32 P A R C systolic merge

33 Heiko Schröder, 1998Slide 33 P A R C systolic merge

34 Heiko Schröder, 1998Slide 34 P A R C systolic merge

35 Heiko Schröder, 1998Slide 35 P A R C systolic merge

36 Heiko Schröder, 1998Slide 36 P A R C sorted !!! systolic merge

37 Heiko Schröder, 1998Slide 37 P A R C Characteristics of SAs Extremely high cost-performance no flexibility -- long development time Suitable for special signal processing tasks ???

38 Heiko Schröder, 1998Slide 38 P A R C Systolic architectures I

39 Heiko Schröder, 1998Slide 39 P A R C Systolic architectures II

40 Heiko Schröder, 1998Slide 40 P A R C ISA merge C:=min{C, CE} C:=max{C, CW}

41 Heiko Schröder, 1998Slide 41 P A R C ISA merge

42 Heiko Schröder, 1998Slide 42 P A R C ISA merge

43 Heiko Schröder, 1998Slide 43 P A R C ISA merge

44 Heiko Schröder, 1998Slide 44 P A R C ISA merge

45 Heiko Schröder, 1998Slide 45 P A R C ISA merge

46 Heiko Schröder, 1998Slide 46 P A R C ISA merge

47 Heiko Schröder, 1998Slide 47 P A R C ISA merge

48 Heiko Schröder, 1998Slide 48 P A R C ISA merge

49 Heiko Schröder, 1998Slide 49 P A R C ISA merge

50 Heiko Schröder, 1998Slide 50 P A R C ISA merge

51 Heiko Schröder, 1998Slide 51 P A R C ISA merge

52 Heiko Schröder, 1998Slide 52 P A R C ISA merge

53 Heiko Schröder, 1998Slide 53 P A R C ISA merge

54 Heiko Schröder, 1998Slide 54 P A R C ISA merge

55 Heiko Schröder, 1998Slide 55 P A R C ISA merge

56 Heiko Schröder, 1998Slide 56 P A R C ISA merge

57 Heiko Schröder, 1998Slide 57 P A R C ISA merge

58 Heiko Schröder, 1998Slide 58 P A R C ISA merge

59 Heiko Schröder, 1998Slide 59 P A R C ISA merge

60 Heiko Schröder, 1998Slide 60 P A R C ISA merge

61 Heiko Schröder, 1998Slide 61 P A R C ISA merge

62 Heiko Schröder, 1998Slide 62 P A R C ISA merge

63 Heiko Schröder, 1998Slide 63 P A R C ISA merge

64 Heiko Schröder, 1998Slide 64 P A R C ISA merge

65 Heiko Schröder, 1998Slide 65 P A R C ISA merge

66 Heiko Schröder, 1998Slide 66 P A R C ISA merge

67 Heiko Schröder, 1998Slide 67 P A R C ISA merge

68 Heiko Schröder, 1998Slide 68 P A R C ISA merge

69 Heiko Schröder, 1998Slide 69 P A R C ISA merge

70 Heiko Schröder, 1998Slide 70 P A R C ISA merge

71 Heiko Schröder, 1998Slide 71 P A R C Hough transform on the ISA good line detection method shear Fast tomography

72 Heiko Schröder, 1998Slide 72 P A R C robot vision projector CCD stereo vision

73 Heiko Schröder, 1998Slide 73 P A R C Use of the ISA Areas of application for ISA: automatic optical quality control real time signal processing computer graphics /visualization linear equations Cryptography --> Tele-medicine ? Special features: fast aggregate functions (sum, carry) fast local communication no local memory typical improvement over PC: Factor 20-30

74 Heiko Schröder, 1998Slide 74 P A R C Instruction Systolic Array

75 Heiko Schröder, 1998Slide 75 P A R C PIPS ( ) 1 M bit 1 M bit 32x32 torus 16 bit parallel communication 16 bit add prefetch memory control BHP -- CSIRO -- NU -- ADFA 1.4 M

76 Heiko Schröder, 1998Slide 76 P A R C Special features: local memory SIMD-torus memory pre-fetch Applications: visualization 3D-simulation (CFD, FEM)

77 Heiko Schröder, 1998Slide 77 P A R C

78 Heiko Schröder, 1998Slide 78 P A R C PIPS

79 Heiko Schröder, 1998Slide 79 P A R C Use in industry ? Performance [Gflops] Research Industry

80 Heiko Schröder, 1998Slide 80 P A R C Investments Investments into parallel computers [M$] Research Industry

81 Heiko Schröder, 1998Slide 81 P A R C Concentration Number of manufacturers

82 Heiko Schröder, 1998Slide 82 P A R C Degree of Parallelism May-93 Nov-93 May-94 Nov-94 May-95 Nov-95 May-96 Nov-96 May-97 Nov-97 1 to to to and more Number of new Systems

83 Heiko Schröder, 1998Slide 83 P A R C Evaluation Cost * computation time Parallel computers with standard components Imbedded parallel systems

84 Heiko Schröder, 1998Slide 84 P A R C reconfigurable mesh reconfigurable mesh = mesh + interior connections 15 positions low cost

85 Heiko Schröder, 1998Slide 85 P A R C global OR and modulo ** V log n on EREW-PRAM * 1 mod 3 log n / log log n on CRCW-PRAM

86 Heiko Schröder, 1998Slide 86 P A R C sorting with all-to-all mapping Sorting: sort blocks all-to-all (columns) sort blocks all-to-all (rows) o-e-sort blocks

87 Heiko Schröder, 1998Slide 87 P A R C all-to-all mapping n x n

88 Heiko Schröder, 1998Slide 88 P A R C vertical all-to-all

89 Heiko Schröder, 1998Slide 89 P A R C horizontal all-to-all

90 Heiko Schröder, 1998Slide 90 P A R C 1 step 2 steps 3 steps k/2 steps 3 steps 2 steps 1 step (k/2) 2 steps

91 Heiko Schröder, 1998Slide 91 P A R C sorting in optimal time (k/2) 2 steps k=n 1/3 each step takes n 1/3 time --> T= n/4 x 2 /2 T = n/2 all-to-all Sorting: sort blocks (O(n 2/3 )) all-to-all (n/2) sort blocks (O(n 2/3 )) all-to-all (n/2) sort blocks (O(n 2/3 )) time: n + o(n)

92 Heiko Schröder, 1998Slide 92 P A R C Reconfigurable mesh Special features SIMD constant diameter faster than PRAM ? Suitable applications routing/sorting/load balancing sparse matrix multiplication segmentation / component labeling feature extraction image database ?

93 Heiko Schröder, 1998Slide 93 P A R C Reconfigurable mesh

94 Heiko Schröder, 1998Slide 94 P A R C Optical Highway W=1; P=100 W=100; P=22 All-to-all connection

95 Heiko Schröder, 1998Slide 95 P A R C Horizontal all-to-all Vertical all-to-all

96 Heiko Schröder, 1998Slide 96 P A R C Features of optically connected meshes SIMD/SPMD/MIMD implement all major architectures all-to-all communication in 2 steps Bulk synchronous processing (BSP) no latency hiding no pin-limitation Applications coarse grain parallel computing only? ray-tracing ? ???

97 Heiko Schröder, 1998Slide 97 P A R C Optical Highway 1. H. Schröder et al, RMB --- A Reconfigurable Multiple Bus Network, HPCA 96, San Jose, H. Schröder, O. Sykora, I. Vrto, Optical All-to-All Communication for some Product Graphs, SOFSEM '97, Milovy, Czech Republic, 1997

98 Heiko Schröder, 1998Slide 98 P A R C Bisection-width / Diameter Diameter log n bisection width n diameter bisection width SA ISA PIPS Diameter 1 bisection width RM Diameter 1 bisection width n OH

99 Heiko Schröder, 1998Slide 99 P A R C Suitable problems ? diameter log n bisection width n SA: suitable applications? ISA: 2D-problems, aggregate functions local communication PIPS: 3D-problems, local communication RM: diameter-bound > bisection-width-bound OH: PRAM equivalent? SA ISA PIPS RM OH

100 ? ? ? ?


Herunterladen ppt "The Future of Parallel Computing Special Purpose Mesh Architectures Heiko Schröder, 1998 P A R C SA ISA PIPS RM OH."

Ähnliche Präsentationen


Google-Anzeigen