Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 1 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Part 10 Thread and.

Slides:



Advertisements
Ähnliche Präsentationen
Cadastre for the 21st Century – The German Way
Advertisements

Service Oriented Architectures for Remote Instrumentation
E-Solutions mySchoeller.com for Felix Schoeller Imaging
Service Discovery in Home Environments
H - A - M - L - E - IC T Teachers Acting Patterns while Teaching with New Media in the Subjects German, Mathematics and Computer Science Prof. S. Blömeke,
P R O B e r u f ProBeruf e.V. Angelika Bühler Arbeitstreffen EP-EvaluatorInnen und der Programm-Evaluation EQUAL, 15. / 16. Dez. 2004, Berlin Mehrwert.
P. Marwedel Informatik 12, U. Dortmund
Managing the Transition from School-to-Work Empirical Findings from a Mentoring Programme in Germany Prof. i.V. Dr. Martin Lang.
R. Zankl – Ch. Oelschlegel – M. Schüler – M. Karg – H. Obermayer R. Gottanka – F. Rösch – P. Keidler – A. Spangler th Expert Meeting Business.
Multi electron atoms Atoms with Z>1 contain >1 electron. This changes the atomic structure considerably because in addition to the electron-nucleus interaction,
Embedded & Real-time Operating Systems
Fakultät für informatik informatik 12 technische universität dortmund Optimizations Peter Marwedel TU Dortmund Informatik 12 Germany 2009/01/17 Graphics:
Fakultät für informatik informatik 12 technische universität dortmund Mapping of Applications to Platforms Peter Marwedel TU Dortmund, Informatik 12 Germany.
Fakultät für informatik informatik 12 technische universität dortmund Optimizations Peter Marwedel TU Dortmund Informatik 12 Germany 2010/01/13 Graphics:
Fakultät für informatik informatik 12 technische universität dortmund Universität Dortmund Middleware Peter Marwedel TU Dortmund, Informatik 12 Germany.
Peter Marwedel TU Dortmund, Informatik 12
Fakultät für informatik informatik 12 technische universität dortmund Hardware/Software Partitioning Peter Marwedel Informatik 12 TU Dortmund Germany Chapter.
Technische universität dortmund fakultät für informatik informatik 12 Embedded System Hardware - Processing - Peter Marwedel Informatik 12 TU Dortmund.
Regional Support in the context of LCG/EGEE
NUMEX – Numerical experiments for the GME Fachhochschule Bonn-Rhein-Sieg Wolfgang Joppich PFTOOL - Precipitation forecast toolbox Semi-Lagrangian Mass-Integrating.
Wozu die Autokorrelationsfunktion?
Computer Architecture Prof. Dr. Uwe Brinkschulte
Hier wird Wissen Wirklichkeit Computer Architecture – Part 4 – page 1 of 35 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Part 4 Fundamentals.
Hier wird Wissen Wirklichkeit Computer Architecture – Part 7 – page 1 of 56 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Part 7 Instruction.
Hier wird Wissen Wirklichkeit Computer Architecture – Part 5 – page 1 of 25 – Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting Part 5 Fundamentals in.
1 Geistes-, Natur-, Sozial- und Technikwissenschaften – gemeinsam unter einem Dach The Academic Information Domain DGI Top-Tech-Trends Panel 2010 Dr. Wolfram.
Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Architectures and Diagnosis Methods for Self Repairing.
Three minutes presentation I ArbeitsschritteW Seminar I-Prax: Inhaltserschließung visueller Medien, Spree WS 2010/2011 Giving directions.
Lehrstuhl Informatik III: Datenbanksysteme AstroGrid-D Meeting Heidelberg, Informationsfusion und -Integrität: Grid-Erweiterungen zum Datenmanagement.
Introduction to the topic. Goals: Improving the students essay style in general Finding special words and expressions that can be used in essay writing.
Seminar Telematiksysteme für Fernwartung und Ferndiagnose Basic Concepts in Control Theory MSc. Lei Ma 22 April, 2004.
Institut für Umweltphysik/Fernerkundung Physik/Elektrotechnik Fachbereich 1 SADDU June 2008 S. Noël, K.Bramstedt,
Integration of renewable energies: competition between storage, the power grid and flexible demand Thomas Hamacher.
HAW Hamburg, CARPE 2011, Prof. Dr. Rüdiger Weißbach, Revision : Bridging the Communication Gap in IT Projects - Enabling Non-IT Professionals.
Laurie Clarcq The purpose of language, used in communication, is to create a picture in the mind and/or the heart of another.
Institut AIFB, Universität Karlsruhe (TH) Forschungsuniversität gegründet 1825 Towards Automatic Composition of Processes based on Semantic.
Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus 1 Hierarchical Test Technology for Systems on a.
Sanjay Patil Standards Architect – SAP AG April 2008
| DC-IAP/SVC3 | © Bosch Rexroth Pneumatics GmbH This document, as well as the data, specifications and other information set forth in.
BAS5SE | Fachhochschule Hagenberg | Daniel Khan | S SPR5 MVC Plugin Development SPR6P.
INTAKT- Interkulturelle Berufsfelderkundungen als ausbildungsbezogene Lerneinheiten in berufsqualifizierenden Auslandspraktika DE/10/LLP-LdV/TOI/
Guten Morgen!.
Verben Wiederholung Deutsch III Notizen.
Fusszeilentext – bitte in (Ansicht – Master – Folienmaster, 1. Folie oben) individuell ändern! Danach wieder zurück in Normalansicht gehen! 1 OTR Shearography.
Einführung Bild und Erkenntnis Einige Probleme Fazit Eberhard Karls Universität Tübingen Philosophische Fakultät Institut für Medienwissenschaft Epistemic.
Berner Fachhochschule Hochschule für Agrar-, Forst- und Lebensmittelwissenschaften HAFL Recent activities on ammonia emissions: Emission inventory Rindvieh.
4th Symposium on Lidar Atmospheric Applications
Ein Projekt des Technischen Jugendfreizeit- und Bildungsvereins (tjfbv) e.V. kommunizieren.de Blended Learning for people with disabilities.
Image Processing and Analysis Introduction. How do we see things ?
The most obvious or direct use of auch is to mean also. Ich möchte auch Gitarre lernen. Auch ich möchte Gitarre lernen. I would like to learn Guitar. Someone.
Cross-Polarization Modulation in DWDM Systems
By: Jade Bowerman. German numbers are quite a bit like our own. You start with one through ten and then you add 20, 30, 40 or 50 to them. For time you.
Alltagsleben Treffpunkt Deutsch Sixth Edition
3rd Review, Vienna, 16th of April 1999 SIT-MOON ESPRIT Project Nr Siemens AG Österreich Robotiker Technische Universität Wien Politecnico di Milano.
Adjectiv Endungen Lite: Adjective following articles and pre-ceeding nouns. Colors and Clothes.
Two-part conjunctions
Berner Fachhochschule Hochschule für Agrar-, Forst- und Lebensmittelwissenschaften HAFL 95% der Ammoniakemissionen aus der Landwirtschaft Rindvieh Pflanzenbau.
Wind Energy in Germany 2004 Ralf Christmann, BMU Joachim Kutscher, PTJ
Sentence Structure Subject and verb are always together. Subject and verb are always together. Subject and verb must agree Subject and verb must agree.
German Word Order explained!
1 Intern | ST-IN/PRM-EU | | © Robert Bosch GmbH Alle Rechte vorbehalten, auch bzgl. jeder Verfügung, Verwertung, Reproduktion, Bearbeitung,
1 Stevens Direct Scaling Methods and the Uniqueness Problem: Empirical Evaluation of an Axiom fundamental to Interval Scale Level.
Lehrstuhl für Waldbau, Technische Universität MünchenBudapest, 10./11. December 2006 WP 1 Status (TUM) Bernhard Felbermeier.
Selectivity in the German Mobility Panel Tobias Kuhnimhof Institute for Transport Studies, University of Karlsruhe Paris, May 20th, 2005.
EN/FAD Ericsson GmbH EDD/ Information im 21. Jahrundert muss Erwünscht Relevant Erreichbar Schnell Kostenlos!?
Technische Universität München 1 CADUI' June FUNDP Namur G B I The FUSE-System: an Integrated User Interface Design Environment Frank Lonczewski.
TUM in CrossGrid Role and Contribution Fakultät für Informatik der Technischen Universität München Informatik X: Rechnertechnik und Rechnerorganisation.
Institut für Nachrichtentechnik U. Reimers Technische Universität Braunschweig The MultiMedia Home Platform (MHP): Hype or Reality ?
Computer Services Business challenge
Work in Progress Ignacio Yaselli, Brunel University
 Präsentation transkript:

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 1 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Part 10 Thread and Task Level Parallelism Computer Architecture Slide Sets WS 2010/2011 Prof. Dr. Uwe Brinkschulte Prof. Dr. Klaus Waldschmidt

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 2 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Basic concepts Thread: Threads are lightweight processes. They consist of several instructions. The threads share a common (virtual) address space. Threads can communicate via this common address space. Task: Tasks are heavyweight processes. Each task has its own address space. Tasks can only communicate via inter task communication channels like shared memory, pipes, message queues or sockets. A task can contain several threads

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 3 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Basic concepts Instruction level parallelism is limited. To further exploit parallel processing, thread or task level parallelism can be used. Two major architectures are known: Multithreaded processors exploit thread level parallelism Chip multiprocessors (multi core processors, many core processors) exploit task level parallelism Both concepts are also used in combination

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 4 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Basic concepts In a multi-threaded processor instructions of several threads of the program are candidates for concurrent issuing. This can be done in a classical scalar pipeline to hide the latencies of memory access. Here, instructions from several threads can be processed in the different pipeline stages. In can be as well combined with a superscalar pipeline to increase the level of possible parallelism from the intra thread level to the inter thread level. This is called SMT (Simultaneous Multithreading).

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 5 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Basic concepts Chip multiprocessors combine multiple processor cores on a single chip. Therefore these processors are also called multi core processors. Today's multicore processors integrate cores on a chip. By increasing the number of cores in the future (e.g. > 100), the term many core processors is used. These cores can execute several tasks in parallel. Cores can be homogeneous or heterogeneous. Having multithreaded cores, multithreading and chip multiprocessing can be combined.

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 6 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Multithreaded Architectures Multithreaded processor: Supports the execution of multiple threads by hardware It can store the context information of several threads in separate register sets and execute instructions of different threads at the same time in the processor pipeline Different stages of the processor pipeline can contain instructions from different threads This exploits thread level parallelism on basis of parallelism in time (pipelining)

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 7 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Multithreaded Architectures Goal: Reduction of latencies caused by memory accesses or dependencies Such latencies can be bridged by switching to another thread During the latency, instructions from other threads are feed into the pipeline => the processor ultilzation is raised, the throughput of a load consisting of multiple threads increases (while the throughput of a single thread remains the same) Explicit multithreaded processors: each thread is a real thread of the application program Implicit multithreaded processors: speculative parallel threads are created dynamically out of a sequential program

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 8 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Basic multithreading techniques (b) Cycle-by-cycle- Interleaving-Technik (fine-grain multithreading): Context is switched each clock cycle (c) Block-Interleaving-Technik (coarse-grain multithreading): Instructions of a thread are executed until an event causes a latency. Then context is switched. (a) single threaded prozessor

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 9 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Comparing multithreading to superscaler and VLIW a: four times superscalar processorb: four times VLIW processor c: four times superscaler processord: four times VLIW processor with cycle by cycle interleaving with cycle by cycle interleaving

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 10 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Classification of block interleaving techniques

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 11 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Simultaneous multithreading (SMT) A simultaneous multithreaded processor is able to issue instructions of multiple threads to multiple execution units in a single clock cycle. This exploits thread level and instruction level parallelism in time and space

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 12 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Comparing SMT to chip multiprocessing Simultaneous multithreading (a) and chip multiprocessing (b)

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 13 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Other applications of multithreading Resulting from the ability of fast context switching more application fields for multithreading arise Reduction of energy consumption Mispredictions in superscaler processors cost energy. Multithreaded processors can execute instructions from other threads instead Event handling Helper threads handle special events (e.g. carbage collection) Real-time processing Allows efficient real-time scheduling polocies like LLF or GP

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 14 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Chip multiprocessing architectures A Chip-Multiprocessor (CMP) combines several processors on a single chip Instead of chip-multiprocessor, today this is also called Multi-Core- Processor, where a core denotes a single processor on the multi-core processor chip Each core can have the complexity of todays microprocessors and holds ist own primary cache for instructions and data Usually, the cores are organized as memory coupled multi processors with a shared address space Furthermore, a secondary cache is contained on the chip For future multi-core processors containing a large number of cores (>100), the term Many-Core-Processor is used

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 15 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Possible multi-core-configurations (1) Pro- cessor Pro- cessor Pro- cesso r Pro- cessor Primary Cache Secondary Cache Secondary Cache Secondary Cache Secondary Cache Global Memory Primary Cache Primary Cache Primary Cache Pro- cessor Pro- cessor Pro- cessor Pro- cessor Primary Cache Secondary Cache Global Memory Primary Cache Primary Cache Primary Cache shared-main memory shared-secondary cache

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 16 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Pro- cessor Pro- cessor Pro- cessor Pro- cessor Secondary Cache Global Memory Primary Cache shared-primary cache Possible multi-core-configurations (2)

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 17 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Chip-Multiprocessor / Multi-Core Simulations show the shared secondary cache architecture superior to shared primary cache and shared main memory Therefore, mostly a large shared secondary cache is implemented on the processor chip Cache coherency protocols known from symmetric multi-processor architectures (e.g. MESI protocol) guarantee a correct access to the shared memory cells from inside and outside the processor chip Today, chip multiprocessing is often combined with simultaneous multithreading There, each core is a SMT core giving the advantages of both approaches

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 18 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt An early single chip multiprocessor proposal: Hydra

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 19 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Multi-Core examples IBM Power5 Symmetric multi-core processor with two 64-bit 2 times SMT processors having 64 kBytes instruction cache and 32 kBytes data cache Both cores share a MByte on-chip secondary cache Controller for third level cache as well on chip Four Power5 chips and four L3 cache chips are combined in a multi-chip module

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 20 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Multi-Core examples IBM Power6 Similar to Power5, but superscaler in-order-execution Level 1 cache size raised to 64 kBytes for instructions and data on each core 65 nm process 5 GHz clock frequency

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 21 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Multi-Core examples IBM Power7 released in , 6 or 8 cores Turbo mode deactivates 4 out of 8 cores, but gives access to all memory controllers for the remaining 4 cores => improves single core performance Each core supports 4 times SMT 45 nm process 4 GHz clock frequency

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 22 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Multi-Core examples Intel Core 2 Duo (Wolfdale) 2 processor cores of Intel Core 2 architecture 32 kBytes data and instruction cache for each core 6 MBytes L2 cache 45 nm process 3 Ghz clock frequency L2 Cache Shared by both cores Core 1 Core 2

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 23 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Microarchitecture of Intel Core 2 family (a single core) Source: ct 16/2006 Multi-Core examples

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 24 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Multi-Core examples Intel Core 2 Quad (Yorkfield) 2 Wolfdale dices in a multi-chip module => 4 processor cores of Intel Core 2 architecture 32 kBytes data and instruction cache for each core 6 MBytes L2 cache for each dice 45 nm process 3 Ghz clock frequency

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 25 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt While homogeneous multi-core processors are commonly used for general purpose computing, heterogeneous multi-core processors are seen as a future trend for embedded systems A first member of this technology is the IBM Cell processor containing a Power processor (Power Processor Element, PPE) and 8 dependend processors (Synergistic Processing Elements, SPE) PPE: based on Power architecture, two times SMT, controls the 8 SPEs SPE: contains a RISC processor with 128 bit SIMD (multimedia) instructions, a memory flow controller and a bus controller Originally designed for Sony Playstation 3, the cell processor is now used in various application domains Heterogeneous multi-cores

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 26 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Cell Processor Die

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 27 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Multi-Core discussion: performance Due to multithreading in PC and server operating systems, two to four cores significantly increase the processor throughput Exploiting eight or more cores requires parallel application programs Hence, software development is challenged to deliver the necessary number of parallel threads by either parallelizing compilers or parallel applications Experiences from multiprocessors show a moderate number of parallel threads resulting in high performance improvement, but this does not scale to a higher amount of parallelism Beginning with 4 to 8 threads, the performance improvement is dramatically reduced Using 8 cores, except for very computing intensive applications some cores will be temporarily idle Furthermore, memory bandwidth can become a bottleneck

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 28 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt While current multi-core processors use cache coupled interconnection, future processors might rely on grid structures (network on chip) to improve performance Adaptive and reconfigurable MPSoC (Multi-Processor Systens-on-.Chip) will gain importance for embedded systems and general purpose computing Reconfigurable cache memories might allow variable connections to different cores Available input/output bandwidth is still an open problem for throughput oriented programs Multi-Core discussion: hardware

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 29 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt For data access, transactional memory might be is a model for future multi-core processors Similar to database systems, memory access is organized as a transaction being executed completely or not at all Hardware support for checkpointing and rollback is necessary As an advantage, concurrent access is simplified (no locks) Furthermore, fault tolerance and dependability techniques will become more important as the error probability will increase with decreasing transistor dimensions On chip power management will keep the importance it has already today Multi-Core discussion: hardware

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 30 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Currently, operating system concepts known from memory coupled multiprocessor systems are used. Here, the operating system scheduler assigns independent processes to the available processors Different to these concepts, the closer core connection of multi-core processors leads to a different computation versus synchronization ratio allowing to use more fine grain parallelism Parallel computing will become the future standard programming model Most of the currently existing software is sequential, thus can run only on one core Programming languages and tools to exploit the fine grain parallelism of multi-core processors need to be developed Furthermore, software engineering techniques are needed to allow the development of safe parallel programs Multi-Core discussion: software

Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 31 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt The application development for multi-core processors will become one of the main future market places for computer scientists Todays applications have to be proceeded with the goal to exploit parallelism, gain performance and increase comfort New applications currently not realizable due to a lack of processor performance will arise These are hard to predict Possible applications must have the need for high computational performance reachable by parallelism Such applications might come from speech recognition, image recognition, data mining, learning technologies or hardware synthesis Multi-Core discussion: software