Technische universität dortmund fakultät für informatik informatik 12 Embedded System Hardware - Processing - Peter Marwedel Informatik 12 TU Dortmund.

Slides:



Advertisements
Ähnliche Präsentationen
Digital Output Board and Motherboard
Advertisements

Cadastre for the 21st Century – The German Way
Service Oriented Architectures for Remote Instrumentation
Finding the Pattern You Need: The Design Pattern Intent Ontology
SION Vacuum Circuit-Breakers 3AE5 and 3AE1
Z-Transformation Die bilaterale Z-Transformation eines Signals x[n] ist die formale Reihe X(z): wobei n alle ganzen Zahlen durchläuft und z, im Allgemeinen,
Embedded System Hardware
DNS-Resolver-Mechanismus
P. Marwedel Informatik 12, U. Dortmund
Peter Marwedel Informatik 12 TU Dortmund Germany
PPTmaster_BRC_ pot Rexroth Inline compact I/O technology in your control cabinet SERCOS III Components Abteilung; Vor- und Nachname.
R. Zankl – Ch. Oelschlegel – M. Schüler – M. Karg – H. Obermayer R. Gottanka – F. Rösch – P. Keidler – A. Spangler th Expert Meeting Business.
Informatik 12, TU Dortmund
fakultät für informatik informatik 12 technische universität dortmund Test Peter Marwedel TU Dortmund Informatik 12 Germany 2009/01/17 Graphics: © Alexandra.
Embedded & Real-time Operating Systems
Fakultät für informatik informatik 12 technische universität dortmund Optimizations Peter Marwedel TU Dortmund Informatik 12 Germany 2009/01/17 Graphics:
fakultät für informatik informatik 12 technische universität dortmund Optimizations Peter Marwedel TU Dortmund Informatik 12 Germany 2009/01/10 Graphics:
Fakultät für informatik informatik 12 technische universität dortmund Mapping of Applications to Platforms Peter Marwedel TU Dortmund, Informatik 12 Germany.
Fakultät für informatik informatik 12 technische universität dortmund Optimizations Peter Marwedel TU Dortmund Informatik 12 Germany 2010/01/13 Graphics:
Embedded System Hardware - Processing -
Fakultät für informatik informatik 12 technische universität dortmund Universität Dortmund Middleware Peter Marwedel TU Dortmund, Informatik 12 Germany.
Fakultät für informatik informatik 12 technische universität dortmund Specifications Peter Marwedel TU Dortmund, Informatik 12 Graphics: © Alexandra Nolte,
Peter Marwedel TU Dortmund, Informatik 12
Fakultät für informatik informatik 12 technische universität dortmund Embedded System Hardware Peter Marwedel Informatik 12 TU Dortmund Germany 2011/03/09.
Fakultät für informatik informatik 12 technische universität dortmund Hardware/Software Partitioning Peter Marwedel Informatik 12 TU Dortmund Germany Chapter.
Technische universität dortmund fakultät für informatik informatik 12 Embedded System Hardware Peter Marwedel Informatik 12 TU Dortmund Germany
Embedded System Hardware - Processing -
Wozu die Autokorrelationsfunktion?
Hier wird Wissen Wirklichkeit Computer Architecture – Part 4 – page 1 of 35 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Part 4 Fundamentals.
Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 1 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Part 10 Thread and.
Hier wird Wissen Wirklichkeit Computer Architecture – Part 5 – page 1 of 25 – Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting Part 5 Fundamentals in.
Lancing: What is the future? Lutz Heinemann Profil Institute for Clinical Research, San Diego, US Profil Institut für Stoffwechselforschung, Neuss Science.
Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Architectures and Diagnosis Methods for Self Repairing.
Thomas Herrmann Software - Ergonomie bei interaktiven Medien Step 6: Ein/ Ausgabe Instrumente (Device-based controls) Trackball. Joystick.
CCNA Exploration Network Fundamentals
Institut AIFB, Universität Karlsruhe (TH) Forschungsuniversität gegründet 1825 Towards Automatic Composition of Processes based on Semantic.
Institut für Solare Energieversorgungstechnik Verein an der Universität Kassel Bereich Energetische Biomassenutzung, Hanau Dipl.-Ing. J. Müller Bioturbine,
Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus 1 Hierarchical Test Technology for Systems on a.
Sanjay Patil Standards Architect – SAP AG April 2008
A good view into the future Presented by Walter Henke BRIT/SLL Schweinfurt, 14. November 2006.
BAS5SE | Fachhochschule Hagenberg | Daniel Khan | S SPR5 MVC Plugin Development SPR6P.
Department of Computer Science Homepage HTML Preprocessor Perl Database Revision Control System © 1998, Leonhard Jaschke, Institut für Wissenschaftliches.
INTAKT- Interkulturelle Berufsfelderkundungen als ausbildungsbezogene Lerneinheiten in berufsqualifizierenden Auslandspraktika DE/10/LLP-LdV/TOI/
Fusszeilentext – bitte in (Ansicht – Master – Folienmaster, 1. Folie oben) individuell ändern! Danach wieder zurück in Normalansicht gehen! 1 OTR Shearography.
Faculty of Public Health Department of Health Economics and Management University of Bielefeld WP 3.1 and WP 4.1: Macrocost EUprimecare Plenary Meeting.
Berner Fachhochschule Hochschule für Agrar-, Forst- und Lebensmittelwissenschaften HAFL Recent activities on ammonia emissions: Emission inventory Rindvieh.
Ein Projekt des Technischen Jugendfreizeit- und Bildungsvereins (tjfbv) e.V. kommunizieren.de Blended Learning for people with disabilities.
Image Processing and Analysis Introduction. How do we see things ?
Fakultät für informatik informatik 12 technische universität dortmund Memory-architecture aware compilation - Sessions Peter Marwedel TU Dortmund.
Cross-Polarization Modulation in DWDM Systems
By: Jade Bowerman. German numbers are quite a bit like our own. You start with one through ten and then you add 20, 30, 40 or 50 to them. For time you.
KIT – die Kooperation von Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH) Vorlesung Knowledge Discovery - Institut AIFB Tempus fugit Towards.
Titelmasterformat durch Klicken bearbeiten Textmasterformate durch Klicken bearbeiten Zweite Ebene Dritte Ebene Vierte Ebene Fünfte Ebene 1 Titelmasterformat.
1 Intern | ST-IN/PRM-EU | | © Robert Bosch GmbH Alle Rechte vorbehalten, auch bzgl. jeder Verfügung, Verwertung, Reproduktion, Bearbeitung,
Fakultät für informatik informatik 12 technische universität dortmund Standard Optimization Techniques 2010/12/20 Peter Marwedel TU Dortmund, Informatik.
Fakultät für informatik informatik 12 technische universität dortmund Memory architecture description languages - Session 20 - Peter Marwedel TU Dortmund.
Launch ON Global.vi System ID object name classname Services to suscribe Observer Control Ref vi-path Service name Step 1 : Objects register to the Global.vi´s,
Selectivity in the German Mobility Panel Tobias Kuhnimhof Institute for Transport Studies, University of Karlsruhe Paris, May 20th, 2005.
EN/FAD Ericsson GmbH EDD/ Information im 21. Jahrundert muss Erwünscht Relevant Erreichbar Schnell Kostenlos!?
Technische Universität München 1 CADUI' June FUNDP Namur G B I The FUSE-System: an Integrated User Interface Design Environment Frank Lonczewski.
TUM in CrossGrid Role and Contribution Fakultät für Informatik der Technischen Universität München Informatik X: Rechnertechnik und Rechnerorganisation.
Technische universität dortmund fakultät für informatik informatik 12 Communication Peter Marwedel Informatik 12 TU Dortmund Germany 2012 年 11 月 21 日 These.
OTTO-VON-GUERICKE-UNIVERSITÄT MAGDEBURG Fakultät für Verfahrens- und Systemtechnik Institut für Apparate- und Umwelttechnik INNOVATION AND TECHNICAL PROGRESS:
Fakultät für informatik informatik 12 technische universität dortmund Memory-architecture aware compilation - Session 13 - Peter Marwedel TU Dortmund Informatik.
Technische Universität München Fakultät für Informatik Computer Graphics SS 2014 Rüdiger Westermann Lehrstuhl für Computer Graphik und Visualisierung.
Institut für Angewandte Mikroelektronik und Datentechnik Phase 5 Architectural impact on ASIC and FPGA Nils Büscher Selected Topics in VLSI Design (Module.
Institut für Angewandte Mikroelektronik und Datentechnik Course and Contest Results of Phase 5 Eike Schweißguth Selected Topics in VLSI Design (Module.
Embedded System Hardware - Processing -
Creating Web Documents
Cooling R&D plans in Aachen
 Präsentation transkript:

technische universität dortmund fakultät für informatik informatik 12 Embedded System Hardware - Processing - Peter Marwedel Informatik 12 TU Dortmund Germany These slides use Microsoft clip arts. Microsoft copyright restrictions apply. © Springer, 2010

- 2 - technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Embedded System Hardware Embedded system hardware is frequently used in a loop (hardware in a loop): cyber-physical systems

- 3 - technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Efficiency: slide from lecture 1 applied to processing CPS & ES must be efficient Code-size efficient (especially for systems on a chip) Run-time efficient Weight efficient Cost efficient Energy efficient © Graphics: Microsoft, P. Marwedel, M. Engel, 2011

- 4 - technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Why care about energy efficiency ? Relevant during use? Execution platform PluggedUncharged periods Unplug- ged E.g.FactoryCarSensor Global warming Cost of energy Increasing performance Problems with cooling, avoiding hot spots Avoiding high currents & metal migration Reliability Energy a very scarce resource Power © Graphics: P. Marwedel, 2011

- 5 - technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Should we care about energy consumption or about power consumption? t P(t)P(t) E Both are closely related, but still different

- 6 - technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Should we care about energy consumption or about power consumption (2)? Minimizing power consumption important for design of the power supply & regulators dimensioning of interconnect, short term cooling Minimizing energy consumption important due to restricted availability of energy (mobile systems) cooling: high costs, limited space thermal effects dependability, long lifetimes In general, we need to care about both

- 7 - technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Prescott: 90 W/cm², 90 nm [ct 4/2004] Nuclear reactor Problem: Power density continues to get worse © Intel M. Pollack, Micro-32

- 8 - technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Surpassed hot (kitchen) plate …? Why not use it? ~htsu/humor/fry_egg.html Strictly speaking, energy is not consumed, but converted from electrical energy into heat energy

- 9 - technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Problem: Increasing performance requirements for mobile phones C.H. van Berkel: Multi-Core for Mobile Phones, DATE, 2009; Workload [MOPs] Many more instances of the power/energy problem

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Where does the power go? - mobile phone - It not just I/O, dont ignore processing! [O. Vargas: Minimum power consumption in mobile-phone memory subsystems; Pennwell Portable Design - September 2005;]

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Where does the power go? - Consumer portable systems – (2) During use, all components & computations relevant C.H. van Berkel: Multi-Core for Mobile Phones, DATE, 2009; (no explicit percentages in original paper) Mobile phone use, breakdown by type of computation (geometry processing, rasterization, pixel shading) (display & camera processing, video (de)coding) (front-end, demodulation, decoding, protocol) (user interface, browsing, …) With special purpose HW!

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Static and dynamic power consumption Dynamic power consumption: Power consumption caused by charging capacitors when logic levels are switched. Static power consumption (caused by leakage current): power consumed in the absence of clock signals Leakage becoming more important due to smaller devices Decreasing V dd reduces P quadratically CLCL CMOS output

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Where is the energy consumed? - Consumer portable systems - According to International Technology Roadmap for Semiconductors (ITRS), 2010 update, [ Current trends violation of max. power constraint (0.5-1 W). © ITRS, 2010

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund How to make systems energy efficient: Fundamentals of dynamic voltage scaling (DVS) Power consumption of CMOS circuits (ignoring leakage): Delay for CMOS circuits: Decreasing V dd reduces P quadratically, while the run-time of algorithms is only linearly increased

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Low voltage, parallel operation more efficient than high voltage, sequential operation Basic equations Power: P ~ V DD ², Maximum clock frequency: f ~ V DD, Energy to run a program: E = P t, with: t = runtime (fixed) Time to run a program: t ~ 1/ f Changes due to parallel processing, with operations per clock: Clock frequency reduced to: f = f /, Voltage can be reduced to: V DD =V DD /, Power for parallel processing: P° = P / ² per operation, Power for operations per clock: P = P° = P /, Time to run a program is still: t = t, Energy required to run program: E = P t = E / Argument in favour of voltage scaling, and parallel processing Rough approxi- mations!

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Energy Efficiency of different target platforms © Hugo De Man,IMEC, Philips, 2007 inherent power efficiency of silicon

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Application Specific Circuits (ASICS) or Full Custom Circuits Approach suffers from long design times, lack of flexibility (changing standards) and high costs (e.g. Mill. $ mask costs). Custom-designed circuits necessary if ultimate speed or energy efficiency is the goal and large numbers can be sold. © Graphics: M. Engel, 2012 HW synthesis not covered in this course, lets look at processors

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Energy Efficiency of different target platforms © Hugo De Man,IMEC, Philips, 2007 inherent power efficiency of silicon

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Energy-efficient architectures: Domain- and application specific Close to power efficiency of silicon inherent power efficiency of silicon © Hugo De Man: From the Heaven of Software to the Hell of NanoscalePhysics: An Industry in Transition, Keynote Slides, ACACES, 2007

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Energy-efficient architectures: Domain- and application specific inherent power efficiency of silicon Close to power efficiency of silicon © Hugo De Man: From the Heaven of Software to the Hell of NanoscalePhysics: An Industry in Transition, Keynote Slides, ACACES, 2007

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Energy-efficient architectures: Heterogeneous processors Dark silicon (not all silicon can be powered at the same time, due to current, power or temperature constraints)

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund 2 (For servers) © Babak Falsafi, GHz clock: Many cores unusable Dark silicon

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Voltage scaling: Example V dd [Courtesy, Yasuura, 2000]

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Variable-voltage/frequency example: INTEL Xscale From Intels Web Site

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund 2 (For servers) © Babak Falsafi, 2010

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Dynamic power management (DPM) RUN: operational IDLE: a SW routine may stop the CPU when not in use, while monitoring interrupts SLEEP: Shutdown of on- chip activity RUN SLEEPIDLE 400mW 160µW 50mW 90µs 10µs 160ms Example: STRONGARM SA1100 Power fault signal

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Key requirement #2: Code-size efficiency Overview: codeCompression/codeCompression/node1.html Compression techniques: key idea

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Code-size efficiency Compression techniques (continued): 2nd instruction set, e.g. ARM Thumb instruction set: Reduction to % of original code size 130% of ARM performance with 8/16 bit memory 85% of ARM performance with 32-bit memory Rd 0 Rd 0000 Constant 16-bit Thumb instr. ADD Rd #constant Rd Constant zero extended major opcode minor opcode source= destination Same approach for LSI TinyRisc, … Requires support by compiler, assembler etc. Dynamically decoded at run-time [ARM, R. Gupta]

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Dictionary approach, two level control store (indirect addressing of instructions) Dictionary-based coding schemes cover a wide range of various coders and compressors. Their common feature is that the methods use some kind of a dictionary that contains parts of the input sequence which frequently appear. The encoded sequence in turn contains references to the dictionary elements rather than containing these over and over. [Á. Beszédes et al.: Survey of Code size Reduction Methods, Survey of Code-Size Reduction Methods, ACM Computing Surveys, Vol. 35, Sept. 2003, pp ]

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Key idea (for d bit instructions) Uncompressed storage of a d -bit-wide instructions requires a x d bits. In compressed code, each instruction pattern is stored only once. Hopefully, a x b + c x d < a x d. Called nanoprogramming in the Motorola instruction address CPU d bit b « d bit table of used instructions (dictionary) For each instruction address, S contains table address of instruction. S a b c 2 b small

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund - Key requirement #3: Run-time efficiency - Domain-oriented architectures - Example: Filtering in Digital signal processing (DSP) Signal at t = t s (sampling points)

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Filtering in digital signal processing -- outer loop over -- sampling times t s { MR:=0; A1:=1; A2:= s -1; MX:= w [ s ]; MY:= a [0]; for ( k =0; k <= ( n 1 ); k ++) { MR:=MR + MX * MY; MX:= w [A2]; MY:= a [A1]; A1++; A2 -- ; } x [ s ]:=MR; } Maps nicely ADSP 2100

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund DSP-Processors: multiply/accumulate (MAC) and zero-overhead loop (ZOL) instructions MR:=0; A1:=1; A2:= s -1; MX:= w [ s ]; MY:= a [0]; for ( k :=0 <= n-1 ) {MR:=MR+MX*MY; MY:= a [A1]; MX:= w [A2]; A1++; A2--} Multiply/accumulate (MAC) instructionZero-overhead loop (ZOL) instruction preceding MAC instruction. Loop testing done in parallel to MAC operations.

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Heterogeneous registers MR MF MX MY * +,- AR AF AXAY +,-,.. D P Address generation unit (AGU) Address- registers A0, A1, A2.. Different functionality of registers An, AX, AY, AF,MX, MY, MF, MR Example (ADSP 210x):

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Separate address generation units (AGUs) Data memory can only be fetched with address contained in A, but this can be done in parallel with operation in main data path (takes effectively 0 time). A := A ± 1 also takes 0 time, same for A := A ± M; A := requires extra instruction Minimize load immediates Optimization in optimization chapter Example (ADSP 210x):

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Modulo addressing Modulo addressing: Am++ Am:=(Am+1) mod n (implements ring or circular buffer in memory).. w [t1-1] w [t1] w [t1-n+1] w [t1-n+2].. Memory, t=t1 Memory, t2= t1+1 sliding window w t1t1 t n most recent values.. w [t1-1] w [t1] w [t1+1] w [t1-n+2]..

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Returns largest/smallest number in case of over/underflows Example: a0111 b standard wrap around arithmetic (1)0000 saturating arithmetic1111 (a+b)/2: correct1000 wrap around arithmetic0000 saturating arithmetic + shifted0111 Appropriate for DSP/multimedia applications: No timeliness of results if interrupts are generated for overflows Precise values less important Wrap around arithmetic would be worse. Saturating arithmetic almost correct

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Example MATLAB Demo

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Fixed-point arithmetic Shifting required after multiplications and divisions in order to maintain binary point.

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Real-time capability Timing behavior has to be predictable Features that cause problems: Unpredictable access to shared resources Caches with difficult to predict replacement strategies Unified caches (conflicts between instructions and data) Pipelines with difficult to predict stall cycles ("bubbles") Unpredictable communication times for multiprocessors Branch prediction, speculative execution Interrupts that are possible any time Memory refreshes that are possible any time Instructions that have data-dependent execution times Trying to avoid as many of these as possible. [Dagstuhl workshop on predictability, Nov , 2003]

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Multiple memory banks or memories MR MF MX MY * +,- AR AF AXAY +,-,.. D P Address generation unit (AGU) Address- registers A0, A1, A2.. Simplifies parallel fetches

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Multimedia-Instructions, Short vector extensions, Streaming extensions, SIMD instructions Multimedia instructions exploit that many registers, adders etc are quite wide (32/64 bit), whereas most multimedia data types are narrow 2-8 values can be stored per register and added. E.g.: 2 additions per instruction; no carry at bit 16 a1a2 32 bits b1b2 32 bits c1c2 32 bits + Cheap way of using parallelism SSE instruction set extensions, SIMD instructions

technische universität dortmund fakultät für informatik P.Marwedel, Informatik 12, 2012 TU Dortmund Summary Hardware in a loop Sensors Discretization Information processing Importance of energy efficiency Special purpose HW very expensive Energy efficiency of processors Code size efficiency Run-time efficiency MPSoCs D/A converters Actuators