PCI Express DMA Engine für Active Buffer Projekt im CBM Experiment

Slides:

Advertisements

Ähnliche Präsentationen

Digital Output Board and Motherboard

Advertisements

Mean and variance.

Chapter 5 Input/Output 5.1 Principles of I/O hardware

Link Layer Security in BT LE.

Bild 2.1. Logisches Symbol für D-Kippglied und Fotografie

PPTmaster_BRC_ pot Rexroth Inline compact I/O technology in your control cabinet SERCOS III Components Abteilung; Vor- und Nachname.

Institut für Angewandte Mikroelektronik und Datentechnik Fachbereich Elektrotechnik und Informationstechnik, Universität Rostock Holger Harms, Harald Widiger,

Aufgabenbesprechung Programming Contest. Order 7 Bo Pat Jean Kevin Claude William Marybeth 6 Jim Ben Zoe Joey Frederick Annabelle 0 SET 1 Bo Jean Claude.

Prof. Dr.-Ing. habil. B. Steinbach - Informatik / Softwaretechnologie und Programmierungstechnik - Institut für Informatik Verteilte Software - Java -

Rexroth Inline New serial interface for e.g. barcode scanners

Tolle Protokolle Frankfurt/Main Presentation by Dipl.-Ing. Ralf Steffler Netcool Certified Consultant

W. Oberschelp G. Vossen Kapitel 7.

Rechneraufbau & Rechnerstrukturen, Folie 7.1 © 2006 W. Oberschelp, G. Vossen.

© 2006 W. Oberschelp, G. Vossen Rechneraufbau & Rechnerstrukturen, Folie 8.1.

Internet facts 2006-III Graphiken zum Berichtsband AGOF e.V. März 2007.

Vorlesung 5: Interrupts Universität Bielefeld – Technische Fakultät AG Rechnernetze und verteilte Systeme Peter B. Ladkin Wintersemester.

Link Layer. Physical Layer Link Layer Host Controller Interface L2CAP Attribute Protocol Attribute Profile PUIDRemote ControlProximityBatteryThermostatHeart.

Advance Reservation & QoS Agents Simon Oberthür. 2/ 27Simon Oberthür Inhalt Advance Reservation Was ist Advance Reservation? Probleme und Lösungen Advance.

Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Architectures and Diagnosis Methods for Self Repairing.

Schnittstellen in der Bildverarbeitung

Medien- Technik Midi Musical Instruments Digital Interface ab 1980 Erweiterung: General Midi Midi Sequencer Keyboard/ Synthesizer Sonstiges drum machine.

Christian Steinle, Joachim Gläß, Reinhard Männer

Beschleunigung Virtueller Privater Netze durch Netzwerkprozessoren

AWA 2007 Natur und Umwelt Natürlich Leben

CCNA Exploration Network Fundamentals

RS232 Register und ihre Bits

Die Hausaufgaben: Machen Sie Ü. 7 auf S. 29

Arbeitsweise und Typen von Bridges

Microsoft Office Forms Server

Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus 1 Hierarchical Test Technology for Systems on a.

Sanjay Patil Standards Architect – SAP AG April 2008

1 Ein kurzer Sprung in die tiefe Vergangenheit der Erde.

Parallel Programming Thread Synchronization. Heute 1. Lösung zu Assignment 2 2. Erstellen und Starten von Threads in Java 3. Das synchronized Schlüsselwort.

3.3 Speicher Latches SR-Latch 1-bit Speicher S Q Q R Q Q

1 SR-Latch 3.3 Speicher Latches © Béat Hirsbrunner, University of Fribourg, Switzerland, 31. Oktober 2007 S Q Q R Q Q 1-bit Speicher.

3.4 CPU-Chips und Busse CPU-Chips

VHDL 4: Getaktete Logik (D-FF, Zähler, Automaten)

School of Engineering DT2 Projekt-3: Baud Rate Generator & GPS Decoder.

Inhalt Einen Zähler generisch aufbauen CPLD Synthese Timing Analyse

School of Engineering DT2 Projekt-2: Baud Rate Generator & GPS Decoder.

Real Time Operating System

“Guten Tag!“ Passwort Deutsch - Lektion 1:

Computerorientierte Physik VORLESUNG und Übungen Vorlesung Zeit: Di., 8.30 – Uhr Ort: Hörsaal 5.01, Institut für Experimentalphysik, Universitätsplatz.

Computerorientierte Physik VORLESUNG und Übungen Vorlesung Zeit: Di., 8.30 – Uhr Ort: Hörsaal 5.01, Institut für Experimentalphysik, Universitätsplatz.

Computerorientierte Physik VORLESUNG Zeit: jeweils Mo Uhr Ort: Hörsaal 5.01, Institut für Experimentalphysik, Universitätsplatz 5, A-8010.

Computerorientierte Physik VORLESUNG

Computerorientierte Physik VORLESUNG

Univ.-Lektor Dipl.-Ing. Dr. Markus Schranz staatlich befugter und beeideter Ingenieurkonsulent für Informatik Web Application Engineering & Content Management.

OPENTOUCH™ SUITE FOR MLE

Ertragsteuern, 5. Auflage Christiana Djanani, Gernot Brähler, Christian Lösel, Andreas Krenzin © UVK Verlagsgesellschaft mbH, Konstanz und München 2012.

Travelling Salesman Problem (TSP)

Symmetrische Blockchiffren DES – der Data Encryption Standard

Europa in der Krise Kiel,

Datenverarbeitung im PC

Dr.-Ing. René Marklein - NFT I - L 9 / V 9 - WS 2006 / Numerical Methods of Electromagnetic Field Theory I (NFT I) Numerische Methoden der Elektromagnetischen.

Launch ON Global.vi System ID object name classname Services to suscribe Observer Control Ref vi-path Service name Step 1 : Objects register to the Global.vi´s,

EN/FAD Ericsson GmbH EDD/ Information im 21. Jahrundert muss Erwünscht Relevant Erreichbar Schnell Kostenlos!?

Instrumente und Unterhaltung End of Unit Assessment.

Christian Steinle, Andreas Kugel, Reinhard Männer

Rechnerstrukturen 3b. Endliche Automaten.

Mikrocomputertechnik Jürgen Walter

Komponenten für die Überwachung optischer Kenngrößen in Zugangsnetzen – (COMAN) Projekttreffen , Stuttgart Jörg Hehmann Juli, 2007)

Robotermechanik Trippelmechanismus Schussmechanismus Antriebsmotoren

NE2000: Hardware und Design eines Treibers 1 AKBPII: Abschlusspräsentation NE2000: Hardware und Design eines prototypischen Treibers bearbeitet.

Institut für Angewandte Mikroelektronik und Datentechnik Results of phase 5: Investigations on a specific topic Special Features of the Virtex-6 FPGAs.

Othmar GsengerErwin Nindl Christian Pointner. Inhalt Was ist Anycast? Secure Anycast Tunneling Protocol (SATP) Was ist Anytun Verschlüsselung Live Demo.

Vom Prozessor zum System

CSL211 Computer Architecture

Präsentation transkript:

PCI Express DMA Engine für Active Buffer Projekt im CBM Experiment Wenxue Gao, Andreas Kugel, Reinhard Männer, Holger Singpiel, Andreas Wurz Uni. Mannheim DPG Tagung, Gießen 14 März 2007

Inhalt Einleitung Blockdiagramm Realisierung Leistung 2 von 15

Einleitung – CBM Experiment CBM TSR, Jan. 2006

Einleitung – PCI Express 2,5 Gbps pro Link Point-to-Point TLP (Transaction Layer Packet) Post: MWr (Memory Write Request), … Non-post: MRd (Memory Read Request), … Completion: CplD, Cpl, … Message: Msg 4 von 15

PCI Express – Post TLP (MWr, …) Trn. Host End-Point Rx Tx

PCI Express – Post TLP (MWr, …) Trn. Host End-Point Rx Tx MWr1

PCI Express – Post TLP (MWr, …) Trn. Host End-Point Rx Tx MWr1

PCI Express – Post TLP (MWr, …) Trn. Host End-Point Rx MWr1 Tx

PCI Express – Post TLP (MWr, …) Trn. Host End-Point Rx MWr1 Tx MWr2

PCI Express – Post TLP (MWr, …) Trn. Host End-Point Rx MWr1 Tx MWr2

PCI Express – Post TLP (MWr, …) Trn. Host End-Point Rx MWr1 MWr2 Tx

PCI Express – Post TLP (MWr, …) Trn. Host End-Point Rx MWr1 MWr2 Tx MWr3

PCI Express – Post TLP (MWr, …) Trn. Host End-Point Rx MWr1 MWr2 Tx MWr3

PCI Express – Non-post TLP (MRd, …) Trn. Host End-Point Rx Tx

PCI Express – Non-post TLP (MRd, …) Trn. Host End-Point Rx Tx MRd1

PCI Express – Non-post TLP (MRd, …) Trn. Host End-Point Rx Tx MRd1

PCI Express – Non-post TLP (MRd, …) Trn. Host End-Point Rx MRd1 Tx

PCI Express – Non-post TLP (MRd, …) Trn. Host End-Point Rx MRd1 Tx MRd2

PCI Express – Non-post TLP (MRd, …) Trn. Host End-Point Rx MRd1 Tx MRd2

PCI Express – Non-post TLP (MRd, …) Trn. Host End-Point Rx MRd1 Tx MRd2

PCI Express – Non-post TLP (MRd, …) Trn. Host End-Point Rx CplD1 MRd1 Tx MRd2

PCI Express – Non-post TLP (MRd, …) Trn. Host End-Point Rx CplD2 CplD1 MRd1 Tx MRd2

PCI Express – Non-post TLP (MRd, …) Trn. Host End-Point Rx CplD2 CplD1 MRd1 Tx MRd2

PCI Express – Non-post TLP (MRd, …) Trn. Host End-Point Rx CplD2 CplD1 MRd1 Tx MRd2

PCI Express – Non-post TLP (MRd, …) Trn. Host End-Point Rx CplD1 CplD2 MRd1 Tx MRd2

PCI Express – Non-post TLP (MRd, …) Trn. Host End-Point Rx CplD1 CplD2 MRd1 Tx MRd2

PCI Express – Non-post TLP (MRd, …) Trn. Host End-Point Rx CplD1 CplD2 MRd1 Tx MRd2

PCI Express – Non-post TLP (MRd, …) Trn. Host End-Point Rx CplD1 CplD2 MRd1 MRd1 Tx MRd2 MRd2

PCI Express – Non-post TLP (MRd, …) Trn. Host End-Point Rx CplD1 CplD2 Tag[7:0] MRd1 MRd1 Tx MRd2 MRd2

PCI Express – Non-post TLP (MRd, …) Trn. Host End-Point Rx CplD1 CplD2 Tag[7:0] MRd1 MRd1 Tx MRd2 MRd2

Einleitung – SG DMA SG(Scatter/Gather) Voll-Duplex „Done“ Zustand Multiple-descriptor chain Voll-Duplex Downstream: Host  Endpoint Upstream: Endpoint  Host „Done“ Zustand Status Register Interrupt Downstream Upstream Host Endpoint

Blockdiagramm PCIe Transact . Layer Interface Tag Channel Buffer RAM Rx Tx Tx Arbitrator Memory BRAM + FIFO + Registers Upstream DMA Channel Downstream PIO Rx Resolution PCIe Transact . Layer Interface Channel Buffer Tag RAM

Channel Buffer TLP Channel FIFO TLP ohne Payload TLP mit Payload Breite = 128 Tiefe = 15 TLP ohne Payload Alles im Word TLP mit Payload Lokale Adresse Zusätzliche Informationen LAdr Hdr2 Hdr1 Hdr0 Rx Tx xxxx 95 127 63 31 9 von 15

Realisierung – DMA teilen 4 KB Grenze verboten Address/Length Combination

Realisierung – „Done“ bestätigen Wann ist DMA beendet? „Done“ Zustand nötig CplD‘s für unterschiedliche MRd‘s kommen nicht folgend Mögliche Lösungen Tag RAM lesen CplD zählen Channel Buffer leer Letzten Tag triggern (x) Bitmap füllen 128-bit Register für 7-bit Tags 11 von 15

Leistungsparameter Zielbaustein FFs LUT4s RAMb16 Slices Virtex4 XC4VFX60-11ff672 FFs 9 834 out of 50 560 ( 19 % ) LUT4s 11 464 out of 50 560 ( 22 % ) RAMb16 58 out of 232 ( 25 % ) Slices 9 426 out of 25 280 ( 37 % ) Frequenz ( trn_clk ) 250 MHz Verzögerung (Transaction layer) PIO: 52 ns (MRd  CplD ) DMA: 80 ns (DMA „Start“  Tx TLP) Theoretische Bandbreite 2Gbps x4 = 8Gbps, bi-directional 12 von 15

4-Lane Tests

Offene Fragen Kleinerer Channel Buffer Bessere Behandlung von Fehlern Meistens reichen 64-bit, statt 128-bit Bessere Behandlung von Fehlern Teilweise unvollständig Überschreiben von CplD zu vermeiden Time-out tag Recycling Höhere Bandbreite für downstream DMA

Zusammenfassung PCI Express Vorteile Virtual channels Xilinx Lösung Parallelität Skalierbarkeit Virtual channels 2 DMA Channels 1 PIO Channel Xilinx Lösung 62,5 MHz für x1 250 MHz für x4 15 von 15

x4-ABB Design Summary -------------- Logic Utilization: Number of Slice Flip Flops: 9,834 out of 50,560 19% Number of 4 input LUTs: 11,464 out of 50,560 22% Logic Distribution: Number of occupied Slices: 9,426 out of 25,280 37% Total Number 4 input LUTs: 12,993 out of 50,560 25% Number used as logic: 11,464 Number used as a route-thru: 643 Number used for Dual Port RAMs: 202 Number used as Shift registers: 684 Number of bonded IPADs: 18 out of 62 29% Number of bonded OPADs: 16 out of 24 66% Number of bonded IOBs: 1 out of 352 1% Number of BUFG/BUFGCTRLs: 5 out of 32 15% Number used as BUFGs: 4 Number used as BUFGCTRLs: 1 Number of FIFO16/RAMB16s: 58 out of 232 25% Number used as FIFO16s: 0 Number used as RAMB16s: 58 Number of DSP48s: 2 out of 128 1% Number of DCM_ADVs: 1 out of 12 8% Number of GT11s: 8 out of 16 50% Number of GT11CLKs: 1 out of 8 12%

X4 Test

DMA Prozess Buffer-descriptor Start/Stop Befehl SA (Source Address) DA (Destination Address) NXA (Next Descriptor Address) Length (Length in bytes) Control (Control register) Start/Stop Befehl Upstream: MWr + MRd (dex) Downstream: MRd Busy/Done Zustände erkennen Status Register Interrupt (Msg)

Blockdiagramm Rx Memory BRAM + Registers + FIFO Tx Tx Arbitrator MWr_usp MRd_dsd MRd_usd MRd_dsp Cpl/D MWr Memory BRAM + Registers + FIFO Tag RAM MRd: CplD Cpl MRd Rd Wr Rx Resolution US: Msg DS: DMA Upstream Engine Registers Downstream

Verifizieren PIO + DMA ($random) Output checking Transaction length Address-pair Chain length (DMA) Descriptor Address (DMA) Flow control: *_rdy_n Output checking tsof/teof Data Deskriptor abteilen Downstream (Write) Upstream (Read) Root Endpoint 1 2

Memory Space BRAM FIFO Registers Eventuelle Erweiterung 16KB 32 x 32 Loop-back Registers Write / Read Control / Status Eventuelle Erweiterung DDR (BRAM ähnlich) GbE (FIFO ähnlich) BRAM Registers Loop-Back Wr Rd OFIFO IFIFO