Die Präsentation wird geladen. Bitte warten

Die Präsentation wird geladen. Bitte warten

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 1 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Part 10 Thread and.

Ähnliche Präsentationen


Präsentation zum Thema: "Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 1 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Part 10 Thread and."—  Präsentation transkript:

1 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 1 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Part 10 Thread and Task Level Parallelism Computer Architecture Slide Sets WS 2010/2011 Prof. Dr. Uwe Brinkschulte Prof. Dr. Klaus Waldschmidt

2 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 2 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Basic concepts Thread: Threads are lightweight processes. They consist of several instructions. The threads share a common (virtual) address space. Threads can communicate via this common address space. Task: Tasks are heavyweight processes. Each task has its own address space. Tasks can only communicate via inter task communication channels like shared memory, pipes, message queues or sockets. A task can contain several threads

3 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 3 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Basic concepts Instruction level parallelism is limited. To further exploit parallel processing, thread or task level parallelism can be used. Two major architectures are known: Multithreaded processors exploit thread level parallelism Chip multiprocessors (multi core processors, many core processors) exploit task level parallelism Both concepts are also used in combination

4 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 4 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Basic concepts In a multi-threaded processor instructions of several threads of the program are candidates for concurrent issuing. This can be done in a classical scalar pipeline to hide the latencies of memory access. Here, instructions from several threads can be processed in the different pipeline stages. In can be as well combined with a superscalar pipeline to increase the level of possible parallelism from the intra thread level to the inter thread level. This is called SMT (Simultaneous Multithreading).

5 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 5 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Basic concepts Chip multiprocessors combine multiple processor cores on a single chip. Therefore these processors are also called multi core processors. Today's multicore processors integrate 2 - 8 cores on a chip. By increasing the number of cores in the future (e.g. > 100), the term many core processors is used. These cores can execute several tasks in parallel. Cores can be homogeneous or heterogeneous. Having multithreaded cores, multithreading and chip multiprocessing can be combined.

6 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 6 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Multithreaded Architectures Multithreaded processor: Supports the execution of multiple threads by hardware It can store the context information of several threads in separate register sets and execute instructions of different threads at the same time in the processor pipeline Different stages of the processor pipeline can contain instructions from different threads This exploits thread level parallelism on basis of parallelism in time (pipelining)

7 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 7 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Multithreaded Architectures Goal: Reduction of latencies caused by memory accesses or dependencies Such latencies can be bridged by switching to another thread During the latency, instructions from other threads are feed into the pipeline => the processor ultilzation is raised, the throughput of a load consisting of multiple threads increases (while the throughput of a single thread remains the same) Explicit multithreaded processors: each thread is a real thread of the application program Implicit multithreaded processors: speculative parallel threads are created dynamically out of a sequential program

8 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 8 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Basic multithreading techniques (b) Cycle-by-cycle- Interleaving-Technik (fine-grain multithreading): Context is switched each clock cycle (c) Block-Interleaving-Technik (coarse-grain multithreading): Instructions of a thread are executed until an event causes a latency. Then context is switched. (a) single threaded prozessor

9 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 9 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Comparing multithreading to superscaler and VLIW a: four times superscalar processorb: four times VLIW processor c: four times superscaler processord: four times VLIW processor with cycle by cycle interleaving with cycle by cycle interleaving

10 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 10 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Classification of block interleaving techniques

11 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 11 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Simultaneous multithreading (SMT) 1234 1 4 A simultaneous multithreaded processor is able to issue instructions of multiple threads to multiple execution units in a single clock cycle. This exploits thread level and instruction level parallelism in time and space

12 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 12 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Comparing SMT to chip multiprocessing Simultaneous multithreading (a) and chip multiprocessing (b)

13 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 13 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Other applications of multithreading Resulting from the ability of fast context switching more application fields for multithreading arise Reduction of energy consumption Mispredictions in superscaler processors cost energy. Multithreaded processors can execute instructions from other threads instead Event handling Helper threads handle special events (e.g. carbage collection) Real-time processing Allows efficient real-time scheduling polocies like LLF or GP

14 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 14 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Chip multiprocessing architectures A Chip-Multiprocessor (CMP) combines several processors on a single chip Instead of chip-multiprocessor, today this is also called Multi-Core- Processor, where a core denotes a single processor on the multi-core processor chip Each core can have the complexity of todays microprocessors and holds ist own primary cache for instructions and data Usually, the cores are organized as memory coupled multi processors with a shared address space Furthermore, a secondary cache is contained on the chip For future multi-core processors containing a large number of cores (>100), the term Many-Core-Processor is used

15 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 15 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Possible multi-core-configurations (1) Pro- cessor Pro- cessor Pro- cesso r Pro- cessor Primary Cache Secondary Cache Secondary Cache Secondary Cache Secondary Cache Global Memory Primary Cache Primary Cache Primary Cache Pro- cessor Pro- cessor Pro- cessor Pro- cessor Primary Cache Secondary Cache Global Memory Primary Cache Primary Cache Primary Cache shared-main memory shared-secondary cache

16 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 16 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Pro- cessor Pro- cessor Pro- cessor Pro- cessor Secondary Cache Global Memory Primary Cache shared-primary cache Possible multi-core-configurations (2)

17 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 17 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Chip-Multiprocessor / Multi-Core Simulations show the shared secondary cache architecture superior to shared primary cache and shared main memory Therefore, mostly a large shared secondary cache is implemented on the processor chip Cache coherency protocols known from symmetric multi-processor architectures (e.g. MESI protocol) guarantee a correct access to the shared memory cells from inside and outside the processor chip Today, chip multiprocessing is often combined with simultaneous multithreading There, each core is a SMT core giving the advantages of both approaches

18 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 18 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt An early single chip multiprocessor proposal: Hydra

19 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 19 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Multi-Core examples IBM Power5 Symmetric multi-core processor with two 64-bit 2 times SMT processors having 64 kBytes instruction cache and 32 kBytes data cache Both cores share a 1.41. MByte on-chip secondary cache Controller for third level cache as well on chip Four Power5 chips and four L3 cache chips are combined in a multi-chip module

20 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 20 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Multi-Core examples IBM Power6 Similar to Power5, but superscaler in-order-execution Level 1 cache size raised to 64 kBytes for instructions and data on each core 65 nm process 5 GHz clock frequency

21 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 21 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Multi-Core examples IBM Power7 released in 2010 4, 6 or 8 cores Turbo mode deactivates 4 out of 8 cores, but gives access to all memory controllers for the remaining 4 cores => improves single core performance Each core supports 4 times SMT 45 nm process 4 GHz clock frequency

22 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 22 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Multi-Core examples Intel Core 2 Duo (Wolfdale) 2 processor cores of Intel Core 2 architecture 32 kBytes data and instruction cache for each core 6 MBytes L2 cache 45 nm process 3 Ghz clock frequency L2 Cache Shared by both cores Core 1 Core 2

23 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 23 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Microarchitecture of Intel Core 2 family (a single core) Source: ct 16/2006 Multi-Core examples

24 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 24 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Multi-Core examples Intel Core 2 Quad (Yorkfield) 2 Wolfdale dices in a multi-chip module => 4 processor cores of Intel Core 2 architecture 32 kBytes data and instruction cache for each core 6 MBytes L2 cache for each dice 45 nm process 3 Ghz clock frequency

25 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 25 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt While homogeneous multi-core processors are commonly used for general purpose computing, heterogeneous multi-core processors are seen as a future trend for embedded systems A first member of this technology is the IBM Cell processor containing a Power processor (Power Processor Element, PPE) and 8 dependend processors (Synergistic Processing Elements, SPE) PPE: based on Power architecture, two times SMT, controls the 8 SPEs SPE: contains a RISC processor with 128 bit SIMD (multimedia) instructions, a memory flow controller and a bus controller Originally designed for Sony Playstation 3, the cell processor is now used in various application domains Heterogeneous multi-cores

26 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 26 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Cell Processor Die

27 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 27 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Multi-Core discussion: performance Due to multithreading in PC and server operating systems, two to four cores significantly increase the processor throughput Exploiting eight or more cores requires parallel application programs Hence, software development is challenged to deliver the necessary number of parallel threads by either parallelizing compilers or parallel applications Experiences from multiprocessors show a moderate number of parallel threads resulting in high performance improvement, but this does not scale to a higher amount of parallelism Beginning with 4 to 8 threads, the performance improvement is dramatically reduced Using 8 cores, except for very computing intensive applications some cores will be temporarily idle Furthermore, memory bandwidth can become a bottleneck

28 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 28 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt While current multi-core processors use cache coupled interconnection, future processors might rely on grid structures (network on chip) to improve performance Adaptive and reconfigurable MPSoC (Multi-Processor Systens-on-.Chip) will gain importance for embedded systems and general purpose computing Reconfigurable cache memories might allow variable connections to different cores Available input/output bandwidth is still an open problem for throughput oriented programs Multi-Core discussion: hardware

29 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 29 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt For data access, transactional memory might be is a model for future multi-core processors Similar to database systems, memory access is organized as a transaction being executed completely or not at all Hardware support for checkpointing and rollback is necessary As an advantage, concurrent access is simplified (no locks) Furthermore, fault tolerance and dependability techniques will become more important as the error probability will increase with decreasing transistor dimensions On chip power management will keep the importance it has already today Multi-Core discussion: hardware

30 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 30 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Currently, operating system concepts known from memory coupled multiprocessor systems are used. Here, the operating system scheduler assigns independent processes to the available processors Different to these concepts, the closer core connection of multi-core processors leads to a different computation versus synchronization ratio allowing to use more fine grain parallelism Parallel computing will become the future standard programming model Most of the currently existing software is sequential, thus can run only on one core Programming languages and tools to exploit the fine grain parallelism of multi-core processors need to be developed Furthermore, software engineering techniques are needed to allow the development of safe parallel programs Multi-Core discussion: software

31 Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 31 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt The application development for multi-core processors will become one of the main future market places for computer scientists Todays applications have to be proceeded with the goal to exploit parallelism, gain performance and increase comfort New applications currently not realizable due to a lack of processor performance will arise These are hard to predict Possible applications must have the need for high computational performance reachable by parallelism Such applications might come from speech recognition, image recognition, data mining, learning technologies or hardware synthesis Multi-Core discussion: software


Herunterladen ppt "Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 1 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Part 10 Thread and."

Ähnliche Präsentationen


Google-Anzeigen