Morphware: neue Perspektiven für eingebettete Systeme

Morphware: neue Perspektiven für eingebettete Systeme
Paderborn, November 2003 Reiner Hartenstein* TU Kaiserslautern Morphware: neue Perspektiven für eingebettete Systeme *) IEEE fellow

Rekonfigurierbare Plattformen
„weiche Hardware“ ? (schwarzer Schimmel ?) Morphware 2

Morphware in der Folge der Zeitalter
Datenströme Flowware: Befehlsströme Mainframe Zeitalter Computer Zeitalter Morphware Zeitalter here? Makimoto’s 1st wave: TTL: nand gate, nor gate, flipflop etc. are general purpose; chips for pocket calculators, radio, tv, etc. are application-specific Makimoto’s 2nd wave: microprocessor, mocrocontroller, RAM memory are general purpose; graphics, multimedia, communication chips, etc. are application-specific Makimoto’s 3rd wave: FPGAs (gates and flipflops) are general purpose; question: will the second half wave go application-specific ? von Neumann unterstützt keine Morphware 1957 1967 1977 1987 1997 2007 3

allgegenwärtige eingebettete Systeme
eingebettete Software und Configware wurden Haupt-Vehikel der Produkt- Differenzierung ... ... und zum Brennpunkt im System-Entwurf (Durchsatz und) Flexibilität ist der Schlüssel Informatik-Kurrikula qualifizieren hier nicht 4

3 Wege einen Algorithmus zu implementieren
RAM-basiert Hardware Software Configware von-Neumann-Maschine Configware Anti-Maschine gemischt 5

Ein zweites Programmierungs- Paradigma
Bisher wird die Programmierung beherrscht durch eine prozedurale Denke: Programmierung in der Zeit-Domäne, wobei das dominante Maschinen-Paradigma RAM-basiert ist Die Einführung von Morphware führt hier letztlich zu einem Paadigmenwechsel : ... zur Migration der Programmierung vom Prozeduralen hinüber in die strukturelle Domäne Die strukturelle Domäne ist nun ebenfalls RAM-basiert Nun besteht Gelegenheit, die Kluft zwischen Programmiern und Hardware-Leuten zu überbrücken durch geschickte Abstraktions-Mechanismen : ein neues Maschinen- Paradigma 6

Arten von Morphware-Plattformen:
Rekonfigurierbare Logik-Bausteine Rekonfigurierbare Interkonnekt-Bausteine Rekonfigurierbare Datenpfad-Felder fine grain reconfigurable Reconfigurable interconnect fabrics coarse grain reconfigurable 7

>> Gliederung <<
rGA FPGA Rekonfigurierbare Logik Plazierung und Routing Rekonfigurierbare Interkonnekt-Bausteine Rekonfigurierbare Datenpfad-Felder Flowware Datenstrom-basiertes Rechnen Das Anti-Maschinen-Paradigma Morphware: warum ? rGA: reconfigurable Gate Array 8

rGA mit Insel-Architektur
switch rGA mit Insel-Architektur (Ausschnitt) Interkonnekt- Geflecht © 2003, 9 9

Verbindungspunkt aktiviert
TU Kaiserslautern Verbindungspunkt (vergrößert) © 2003, 12 12

Routing beendet für 1 Netz
TU Kaiserslautern Routing beendet für 1 Netz A B 20 Transistoren + 20 Flipflops Plazierungs- und Routing Software bekannt s. 25 Jahren 1979 Silva Lisco (Silicon Valley Research Corp.) bietet CALM-P an Solche Netzwerk-Probleme manuell oder mit Hilfe der Graphen-Theorie behandelbar. 15 15 © 2003,

Rekonfigurierbare Logik Plazierung und Routing Rekonfigurierbare Interkonnekt-Bausteine Rekonfigurierbare Datenpfad-Felder Flowware Datenstrom-basiertes Rechnen Das Anti-Maschinen-Paradigma Morphware: warum ? Plazierung: welchen rLB für welche Funktion ? 16

Routing: Lang-strecken- Netze Auf der Durchreise
B Routing: Lang-strecken- Netze Auf der Durchreise Zu einer Zeit kann ein Weg nur einmal benutzt werden , d. h. nur für ein Signal ... ... Brücken von Königsberg 17

Routing- Stau C D C und D sind nicht erreichbar
rLBs sind nicht 100% nutzbar C kann nicht mit D verbunden werden. C und D benötigen eine andere Plazierung 18

1736 Leonhard Euler Euler‘s Problem der Brücken von Königsberg ist solch ein Netzwerk Problem (1736): Finde einen Weg, der jede Brücke genau einmal passiert ..... ... auch eine Optimierung: keine Brücke bleibt ungenutzt. 19

L. Euler: Solutio Problematis Ad geometriam Situs Pertinentis; Commetarii Academiae Scientiarum Imperialis Petropolitanae 8 (1736), pp Graph edge node Left Bank Right Bank Kneiphof Island Other Island 20

Routing über Chip-Grenzen hinaus
Kabel im Schaltschrank verbinden die Einschübe Einschub-Verdrahtung Verbindet die Karten Karten-Verdrahtung verbindet die Chips vor der Fabrikation ? nach der Fabrikation ? *) 30er: Telefon-Vermittlung (ohne Chips, Crossbar / Hebdreh-Wähler statt Karten) 40er: erste Computer (ohne Chips) 21

Rekonfigurierbare Logik Plazierung und Routing Rekonfigurierbare Interkonnekt-Bausteine Rekonfigurierbare Datenpfad-Felder Flowware Datenstrom-basiertes Rechnen Das Anti-Maschinen-Paradigma Morphware: warum ? rIA rIA: reconfigurable Interconnect Array 22

Crossbar 23 Kreuzschienen-Verteiler
Betulander‘s crossbar switch 1919 Kreuzschienen-Verteiler 1913 J. N. Reynold‘s crossbar switch 1915 patent granted 1926 first public telefon switching application in Shweden NASA telemetrics crossbar array 1964 23

Morphware in the Seventies
Franz J. Rammig: A Concept for the Editing of Hardware Resulting in an Automatic Hardware Editor, 14th DAC, 1977, New Orleans Meta-46 was intended as prototyping system => Emulation of logic behaviour Emulation of timing behaviour Picture of one of 16 cross-bar boards . Selectable library board programmable cross-bar Host Computer (PDP11) Adaptor logic Probe buffers 255 128 META-46 Goldlack: Basic outline 24

RWC Real World Computing, Japan, 40 TFLOPS
Crossbar-Gewicht: 220 t, 3000 km Kabel, 5120 Prozessoren, jeder mit 5000 Pins 25

no of crossbar chips needed
Crossbar vollständig ? Crossbar Chips erhältlich von Aptix, Texas Instruments und anderen eine Schiene verbindet 2 pins n x n/2 n 4 8 100 5000 cossbar chips in a row full partial no of crossbar chips needed Größe des Voll-Kreuzschienen-Verteilers: n x n / 2 26

Routing-Stau- Beispiel mit Umleitung
Umleitungs-Verbindung Routing-Stau- Beispiel mit Umleitung hindurch-routen rGA Direkte Verbindung unmöglich rLB Identitäts Funktion konfiguriert Routing-Resourcen: Logik-Gatter und/oder Pass-Transistoren © 2003, 27 27

Rekonfigurierbare Logik Plazierung und Routing Rekonfigurierbare Interkonnekt-Bausteine Rekonfigurierbare Datenpfad-Felder Flowware Datenstrom-basiertes Rechnen Das Anti-Maschinen-Paradigma Morphware: warum ? rDPA rDPA: reconfigurable Data Path Array 28

Grobkörnig rekonfigurierbar
feinkörnige Morphware-Plattformen already mainstream: reconfigurable logic eigentlich nur Logik- Entwurf auf einer seltsamen Plattform grobkörnige Plattformen (rDPAs): Reconfigurable Computing: nicht so sehr neu – aber erschüttert die Grundlagen unserer Informatik-Kurrikula Eine Größenordnung mehr MIPS/mW als die feinkörnigen 29

Warum grobkörnig ? coarse grain goes far beyond bridging the gap 30
T. Claasen et al.: ISSCC 1999 MOPS / mW *) R. Hartenstein: ISIS 1997 rDPAs (reconfigurable computing)* 1000 100 fest verdrahtet Befehlsstrom-Prozessoren Standard-Mikroprozessoren DSP flexibility throughput hard- wired von Neumann FPGAs rGAs (reconfigurierbare Logik) 10 1 0.1 0.01 0.001 2 1 0.5 0.25 0.13 0.1 0,07 µ feature size 30

kommerzielle rDPAs 31 XPU family (IP cores): PACT Corp., Munich
Gewinn in MIPS/mW: eine bis zwei Größenordnungen XPU128 31

PACT XPP-Architektur: Betriebsweise
MULT eine Operation per Taktzyklus Befehlsstrom-Architektur SHIFT Befehls-Speicher ADD Register ein Datenwort ALU XPP Architektur Konfigurations-Speicher Viele Operationen per Taktzyklus FFT Viterbi Buffer Datenstrom ALU-Feld Filter 32

mapping algorithms efficently onto rDPA
„Structured Configware Design“ [R. H.] array size: 10 x 16 = 160 rDPUs SNN filter on KressArray rout thru only scan window (2D memory*) *) z. B.: MoM achitecture backbus connect not used ??? „But you can‘t implement decisions !“ 33

Wichtigkeit des Bindezeitpunkts
Befehls-fluß prozedurale Domäne read new instruction c 1 Zeit des “instruction fetch” Nicht alle Schaltvorgänge werden durch Configware ausgelöst Software Laufzeit Mikroprozessor Parallel-Computer 1 c generiere Datenpfad strukturelle Domäne Ladezeit Compilationszeit Reconfigurable Computing Configware Konfiguration ist wie eine Art vorgezogener eingefrorener „super instruction fetch“ Fabrikationszeit Full custom oder ASIC 1 c strukturelle Domäne Hardware ??? „But you can‘t implement decisions !“ 34

Was lernen wir daraus ? (anfangs ist Morphware-Anwendung praktiziert worden als Logik-Entwurf auf einer sehr seltsamen Plattform) Probleme der (interdisziplinären) Kommunikation und der Kurrikulums-Entwicklung ... ... gehen oft auf die falsche Wahl der Abstraktions-Ebene zurück 35

weiter Bereich von speed-up Faktoren
Der Schlüssel: algorithmische Cleverness Plattform Anwendungs-Beispiel speed-up Faktor Methode PACT Xtreme 4-by-4 array [2003] 16 tap FIR Filter x16 MOPS/mW straight forward MoM Anti Maschine mit DPLA* [1983] grid-based DRC** 1-metal 1-poly nMOS 256 reference patterns > x1000 (computation time) multiple aspects *) MPC fabrication via E.I.S. multi university project **) Design Rule Check 36

Super-Systolische Pipe-Netzwerke
Der Schlüssel ist das mapping, weniger die Architektur * *) KressArray [ASP-DAC-1995] 37

Rekonfigurierbare Logik Plazierung und Routing Rekonfigurierbare Interkonnekt-Bausteine Rekonfigurierbare Datenpfad-Felder Flowware Datenstrom-basiertes Rechnen Das Anti-Maschinen-Paradigma Morphware: warum ? Datenströme statt Befehlsströme 38

für das Management der Datenströme Software:
Flowware definiert... ... welches Datenobjekt zu welcher zeit an welchem Port time port # x | - Eingangs-Datenströme Ausgangs-Datenströme DPA time port # Flowware: für das Management der Datenströme Software: für das Management der Befehlsströme 39

Paradigmen-Wechsel: Nick Tredennick‘s Sicht
warum 2 Programm-Quellen? Flowware data-stream Algorithmen variabel Ressourcen variabel data-stream-based reconfigurable computing: Software Befehlsstrom- Befehlsstrom-basiertes Rechnen: Algorithmen variabel Ressourcen fest Configware Ressourcen variabel programmierbar 40

Kontroll-prozedural gegen Daten-prozedural
Die strukturelle Domäne ist primär Datenstrom-basiert: Flowware bringt eine (Daten-)prozedurale Abstraktion der (Datenstrom-basierten) strukturellen Domäne Flowware wandelt „prozedural gegen strukturell“ in „Kontroll-prozedural gegen Daten-prozedural“ ... ... ein Troianisches Pferd zur Einführung der strukturellen Domäne in die prozedurale Welt der Programmierer Flowware ist meist verdeckt durch indirekte Befehlsstrom-basierte Implementatierung 41

Rekonfigurierbare Logik Plazierung und Routing Rekonfigurierbare Interkonnekt-Bausteine Rekonfigurierbare Datenpfad-Felder Flowware Datenstrom-basiertes Rechnen Das Anti-Maschinen-Paradigma Morphware: warum ? 42

Configware / Flowware Übersetzung
verteilte Speicher- Architektur asM M Zwischenformat höhere Sprache wrapper CW-Compiler r. Data Path Array rDPA Daten- Ströme mapper Configware „instruction“ fetch before runtime Flowware scheduler Address- Generator Datenzähler 43

früher partitionierender Co-Compiler
[Jürgen Becker 1996] höhere Programmier-Sprache Partitionierer SW Compiler Analyzer / Profiler CW Compiler Both, partitioner and DPSS, use simulated annealing for mapping and optimization.. Resource Parameters SW-Kode CW-Kode 44

>>> extreme effizient: Flowware-basiertes Rechnen
Flowware-Sprachen einfacher als Software-Sprachen und leichter zu lernen Viel weniger Konfigurations-Speicher Adreßrechnung verbraucht kaum Speicherzyklen hohe Parallelität durch vielfache tiefe Pipelines alle Methodologien sind fertig verfügbar click „recent talks“ auch Dissertationen: 45

Computing Paradigmen und Methodologien
1946: Maschinen-Paradigma (von Neumann) 1980: Datenströme (Kung, Leiserson) 1989: Anti-Maschinen-Paradigma 1990: erste rDPA* (Rabaey) 1994: höhere Anti-Maschinen-Programmier-Sprache 1995: super-systolischer Array: rDPA (Kress) 1996+: SCCC (LANL), SCORE, ASPRC, Bee (UCB), ... 1997+: Disziplin der verteilten Speicher-Architekturen 1997: erster Configware/Software partitionierender Compiler *) rDPU = reconfigurable Data Path Unit flowware MoM Streams-C Configurable Computing (SCCC) -- Stream Computations Organized for Reconfigurable Execution (SCORE) ASPRC: Adapting Software Pipelining for Reconfigurable Computing. Bee: The Biggascale Emulation Engine (Broderson et al.); Francky Catthoor 47

Machine paradigms von Neumann (reconf.) data-stream machine Flowware
DPU or rDPU data address generator (data sequencer) memory I/O asM** (anti machine) (Configware) (reconf.) memory instruction stream machine Flowware M instruction stream I/O DPU data stream Software CPU instruction sequencer distributed memory architecture* I/O M (r)DPA memory I/O M (r)DPU *) the new discipline came just in time: see Herz et al.: Proc. IEEE ICECS, 2002 48 also see books by Francky Catthoor et al.

Mega-rGAs 50 System gates per rGA chip Jahr 10 000 000 1 000 000
10 000 1 000 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 planned Virtex II XC 40250XV Virtex XC 4085XL 100 System gates per rGA chip Jahr [Xilinx Data] 200 500 50

ganzes System auf einem Chip alles nötige an Bord
On Chip Memory Controller Power PC Core Embeded RAM Rocket IO Xilinx Virtex-II Pro FPGA Architecture PowerPC 405 RISC CPU (PPC405) cores FPGA Fabric-based on Virtex-II Architecture Source: Ivo Bolsens, Xilinx 51

einige „soft CPU core“ Beispiele
Retro- Emulation Spartan-II 16 bit DSP DSPuva16 FLEX10K30 or EPF6016 i8080A My80 32-bit gr1050 16-bit gr1040 Altera – Mercury 8 bit Nios Altera 22 D-MIPS instr. set 50 MHz Mercury Xilinx up to 100 on one FPGA 32 bit standard RISC 32 reg. by 32 LUT RAM-based reg. MicroBlaze 125 MHz 70 D-MIPS platform architecture core SpartanXL RISC integer C xr16 old Xilinx FPGA Board 16-bit RISC, 2 opd. Instr. YARD-1A 1 Flex 10K20 Acorn-1 Altera, Lattice, Xilinx 8 bit CISC 1Popcorn-1 Lattice 4 isp30256, 4 isp1016 12 bit DSP Reliance-1 2 XILINX 3020 LCA 8 bits Instr. + ext. ROM REGIS 200 XC4000E CLBs CISC, 32 reg. uP bit ARM ARM7 clone SPARC Leon 25 Mhz platform architecture core MircoBlaze vs. Nios: ISE v3.31 vs. Quartus-I v1.0 Virtex II compiles 6 times faster than Mercury and 3 times faster than ApexC Xilinx compiles 1 mio gates in less than one hour See: for: Privatpersonen: YARD-1A, TE16, xr16vx, KIM-RC, PDP-8/X in an XCS10, RISC8 Synthesized PIC, YARD-1A (Yet Another RISC Design), Sparrow (announcement), J32, Reliance-1, PopCorn-1, Acorn-1, private Java processor core: Austin Kim (Lucent), Morris Chang (IIT): Designing A Java ... Private soft CPU: John Rible: Guided Exploration of two FPGA-based CPU Designs MISCs (Minimal Instruction Set Computers) in FPGAs: Novix in an FPGA, MSL16 Processor, P16 in VHDL, Forth Processor in VHDL, E16, 8-bit Stack Processor, FPGA-targetable CPU cores by companies and organizations: Advancel: TinyJ, Altera: Nios, Derivation Systems: LavaCore (Java), Digital Core Design: DR8051, DR8052, D68000, Dolphin Integration: Flip-8051, Ericsson Telecom: ECOMP Erlang processor, Gray Research: xr16, GR CPUs, Green Mountain Computing Systems: GM HC11, Lexra: LX4180, LX4280, LX5280, Nazomi Communications: JStar (Java), Sierra Circuit Design: 65c02, 65cx1, 1802, PIC16C7X, 8085, 6800, 6809, 68HC11, 68000, Silicore: SLC1655, VAutomation: V6502, VZ80, V8-uRISC, V8086, V186, Vulcan: Moon (Java) , Xilinx: KCPSM (8-bit MCU); MicroBlaze (32-bit RISC), Open Source CPU cores et cetera: The Free-IP Project: Free-6502, Free-RISC8, OpenCores: Mini-Risc, OpenRISC 1000, OpenRISC 2000, others Open Collector, LEON SPARC European Space Agency: LEON-1 VHDL Model, Gaisler Research Metaflow's LeonCenter.com Configurable CPU cores hosted in FPGAs: ARCCores: ARC processor Tensilica: Extensa processor Configurable SoCs and FPGAs with hard CPU cores Altera: ARM and MIPS for APEX Atmel: AT94K FPSLIC Field Programmable System Level Integrated Circuits Cypress MicroSystems: PSoC Programmable System-on-a-Chip Triscend: E5 8-bit and A7 32-bit configurable SoCs Xilinx: PowerPC for Virtex-II Configware ! (keine Hardware) 52

rGAs spezifisches Silizium vermeiden !
cost / mio $ 4 3 2 1 mask set cost [eASIC] ASIC NRE and mask cost [dataquest] . 12 16 20 26 28 30 >30 no. of masks 0.8 0.6 0.35 0.25 0.18 0.15 0.13 0.1 0.07 feature size Xilinx 42% Altera 37% Lattice 15% Actel 6% Top 4 PLD Manufacturers 2000 Total: $3.7 Bio PC: 25% 22% communication others: 31% consumer: 16% 6 % automotive FPGAs going into every type of application fastest growing segment of semiconductor market [Dataquest] > $7 billion by 2003 53

... die Emulation überspringen
spezifisches Silizium vermeiden... number of design starts rGA-basiert [N. Tredennick, Gilder Technology Report, 2003] ... die Emulation überspringen immer mehr Prototyp- Plattformen werden als das Produkt direkt an den Kunden ausgeliefert: vollständig konfiguriert 54

PC ersetzt durch PS flowware
data streams ... Mainframe Zeitalter Computer Zeitalter Morphware Zeitalter co-compiler PC ersetzt durch PS (personal supercomputer) Makimoto’s 1st wave: TTL: nand gate, nor gate, flipflop etc. are general purpose; chips for pocket calculators, radio, tv, etc. are application-specific Makimoto’s 2nd wave: microprocessor, mocrocontroller, RAM memory are general purpose; graphics, multimedia, communication chips, etc. are application-specific Makimoto’s 3rd wave: FPGAs (gates and flipflops) are general purpose; question: will the second half wave go application-specific ? rDPA µProc von Neumann anti machine 1957 1967 1977 1987 1997 2007 55

Informatik-Kurrikula verfehlen den veränderten Arbeitsmarkt
Eingebettete Software [DTI* Gesetz] 1 2 10 12 18 Monate Faktor *) Department of Trade and Industry, London (1.4/Jahr) [Moore’s Gesetz] Agenda 2010: 90% allen Kodes geschrieben für eingebettete Systeme 10 mal mehr Programmierer müssen für Hardware- / Configware- / Software- Partitionierung qualifiziert werden ! 56

Vielen Dank für Ihre Geduld
>>> Danke Vielen Dank für Ihre Geduld 57

end - END - 58

Appendix for Discussion
59

Loop Transformation Examples
body endloop sequential processes: resource parameter driven Co-Compilation loop 1-8 trigger endloop reconf.array: host: loop 1-8 body endloop loop 9-16 fork join loop 1-8 body endloop loop 1-4 trigger endloop loop unrolling loop 1-2 trigger endloop strip mining 60

missing the next revolution
University of Kaiserslautern missing the next revolution Ignoring reconfigurable computing by teaching computing fundamentals within our CS curricula is one of the biggest mistakes in the history of information technology application causing the waste billions of dollars. © 2001, 61

Glossary approaching consensus 62 digital system platforms: platform
source „running“ on platform machine paradigm hardware (not programmable) none ISP** software von Neumann morphware configware FPGA: none data stream processor (AMP*) streamware anti machine reconfigurable AMP (rAMP) streamware & configware DPU data path unit rDPU reconfigurable DPU DPA data path array (DPU array) rDPA reconfigurable DPA ISP instruction set processor AM anti machine AMP data stream processor* rAMP reconfigurable AMP *) no “dataflow machine” categories of morphware: morphware use granularity (path width) (re)configurable blocks reconfigurable logic fine grain (FPGA) (~1 bit) CLBs reconfigurable computing coarse grain (e.g. 32 bits) rDPUs (e.g. ALU-like) multi granular: by slice bundling rDPU slices (e.g. 4 bits) **) instruction set processor *) data stream processor 62

Morphware: neue Perspektiven für eingebettete Systeme

Ähnliche Präsentationen

Präsentation zum Thema: "Morphware: neue Perspektiven für eingebettete Systeme"— Präsentation transkript:

Ähnliche Präsentationen

Über Projekt

Feedback

Anmelden

Anmeldung über soziales Netzwerk:

Morphware: neue Perspektiven für eingebettete Systeme

Ähnliche Präsentationen

Präsentation zum Thema: "Morphware: neue Perspektiven für eingebettete Systeme"— Präsentation transkript:

Ähnliche Präsentationen

Über Projekt

Feedback