Mitglied der Helmholtz-Gemeinschaft Computation of Mutual Information Metric for Image Registration on Multiple GPUs Andrew V. Adinetz 1, Markus Axer 2,

Slides:



Advertisements
Ähnliche Präsentationen
Cadastre for the 21st Century – The German Way
Advertisements

Service Oriented Architectures for Remote Instrumentation
Finding the Pattern You Need: The Design Pattern Intent Ontology
Themenportal Europäische Geschichte / Web portal European History
H - A - M - L - E - IC T Teachers Acting Patterns while Teaching with New Media in the Subjects German, Mathematics and Computer Science Prof. S. Blömeke,
P. Marwedel Informatik 12, U. Dortmund
Forschungsdatenzentrum der Bundesagentur für Arbeit im Institut für Arbeitsmarkt- und Berufsforschung Two Issues on Remote Data Access.
R. Zankl – Ch. Oelschlegel – M. Schüler – M. Karg – H. Obermayer R. Gottanka – F. Rösch – P. Keidler – A. Spangler th Expert Meeting Business.
Die ZBW ist Mitglied der Leibniz-Gemeinschaft Copyright © ZBW 2010 Seite 1 Potenziale semantischer Technologien für die Bibliothek der Zukunft Klaus Tochtermann.
Fakultät für informatik informatik 12 technische universität dortmund Optimizations Peter Marwedel TU Dortmund Informatik 12 Germany 2009/01/17 Graphics:
fakultät für informatik informatik 12 technische universität dortmund Optimizations Peter Marwedel TU Dortmund Informatik 12 Germany 2009/01/10 Graphics:
Fakultät für informatik informatik 12 technische universität dortmund Optimizations Peter Marwedel TU Dortmund Informatik 12 Germany 2010/01/13 Graphics:
Fakultät für informatik informatik 12 technische universität dortmund Universität Dortmund Middleware Peter Marwedel TU Dortmund, Informatik 12 Germany.
Peter Marwedel TU Dortmund, Informatik 12
Fakultät für informatik informatik 12 technische universität dortmund Hardware/Software Partitioning Peter Marwedel Informatik 12 TU Dortmund Germany Chapter.
Aufgabenbesprechung Programming Contest. Order 7 Bo Pat Jean Kevin Claude William Marybeth 6 Jim Ben Zoe Joey Frederick Annabelle 0 SET 1 Bo Jean Claude.
Who Wants to be a Millionaire
Lehrstuhl Informatik III: Datenbanksysteme Andreas Scholz 1 Programming Database Web Applications Web Service Technologies Andreas Scholz.
NUMEX – Numerical experiments for the GME Fachhochschule Bonn-Rhein-Sieg Wolfgang Joppich PFTOOL - Precipitation forecast toolbox Semi-Lagrangian Mass-Integrating.
Hier wird Wissen Wirklichkeit Computer Architecture – Part 10 – page 1 of 31 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Part 10 Thread and.
Insulin pump therapy in adults allows metabolic control at lower rates of hypoglycemia along with reduced insulin doses – results from the nationwide DPV-survey.
Lancing: What is the future? Lutz Heinemann Profil Institute for Clinical Research, San Diego, US Profil Institut für Stoffwechselforschung, Neuss Science.
Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Architectures and Diagnosis Methods for Self Repairing.
Three minutes presentation I ArbeitsschritteW Seminar I-Prax: Inhaltserschließung visueller Medien, Spree WS 2010/2011 Giving directions.
Lehrstuhl Informatik III: Datenbanksysteme AstroGrid-D Meeting Heidelberg, Informationsfusion und -Integrität: Grid-Erweiterungen zum Datenmanagement.
Forschungszentrum Jülich In der Helmholtz-Gemeinschaft Rolf Stassen IKP/ COSY Darmstadt 1 Status report HESR 1: RF Cavity.
CCNA Exploration Network Fundamentals
Introduction to the topic. Goals: Improving the students essay style in general Finding special words and expressions that can be used in essay writing.
Frank Nachtrab Medipix2-Meeting Physikalisches Institut Universität Erlangen-Nürnberg Optimising THL-Masks for the Medipix-Quad
CTS2 based Terminology Server – Overview – Project eBPG
Institut für Umweltphysik/Fernerkundung Physik/Elektrotechnik Fachbereich 1 SADDU June 2008 S. Noël, K.Bramstedt,
Question words and word order By the end of this lesson you will have revised question words By the end of this lesson you will be able to use question.
Laurie Clarcq The purpose of language, used in communication, is to create a picture in the mind and/or the heart of another.
Institut AIFB, Universität Karlsruhe (TH) Forschungsuniversität gegründet 1825 Towards Automatic Composition of Processes based on Semantic.
| DC-IAP/SVC3 | © Bosch Rexroth Pneumatics GmbH This document, as well as the data, specifications and other information set forth in.
A good view into the future Presented by Walter Henke BRIT/SLL Schweinfurt, 14. November 2006.
BAS5SE | Fachhochschule Hagenberg | Daniel Khan | S SPR5 MVC Plugin Development SPR6P.
Z Corp Customer Examples
Department of Computer Science Homepage HTML Preprocessor Perl Database Revision Control System © 1998, Leonhard Jaschke, Institut für Wissenschaftliches.
INTAKT- Interkulturelle Berufsfelderkundungen als ausbildungsbezogene Lerneinheiten in berufsqualifizierenden Auslandspraktika DE/10/LLP-LdV/TOI/
Algorithm Engineering Parallele Algorithmen Stefan Edelkamp.
Algorithm Engineering Parallele Algorithmen Stefan Edelkamp.
Verben Wiederholung Deutsch III Notizen.
Fusszeilentext – bitte in (Ansicht – Master – Folienmaster, 1. Folie oben) individuell ändern! Danach wieder zurück in Normalansicht gehen! 1 OTR Shearography.
Fach: Informatik Thema: Maschinendatenerfassung Klasse: 2AHWILJahrgang: 2013/2014 Höhere Technische Bundes-Lehr-und Versuchsanstalt St.Pölten Fachbereich.
Impairments in Polarization-Multiplexed DWDM Channels due to Cross- Polarization Modulation Marcus Winter Christian-Alexander Bunge Klaus Petermann Hochfrequenztechnik-Photonik.
4th Symposium on Lidar Atmospheric Applications
Ein Projekt des Technischen Jugendfreizeit- und Bildungsvereins (tjfbv) e.V. kommunizieren.de Blended Learning for people with disabilities.
Cross-Polarization Modulation in DWDM Systems
ESSnet Workshop Conclusions.
| TU Darmstadt | Fachbereich 18 | Institut Theorie Elektromagnetischer Felder | Dip.-Ing. Cong Liu | 1 Various approaches to electromagnetic.
Die Kooperation von Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH) dCache T1 admins and experts Welcome to FZK.
3rd Review, Vienna, 16th of April 1999 SIT-MOON ESPRIT Project Nr Siemens AG Österreich Robotiker Technische Universität Wien Politecnico di Milano.
Adjectiv Endungen Lite: Adjective following articles and pre-ceeding nouns. Colors and Clothes.
Greetings and goodbyes Deutschland v. USA
Sentence Structure Subject and verb are always together. Subject and verb are always together. Subject and verb must agree Subject and verb must agree.
KIT – die Kooperation von Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH) Vorlesung Knowledge Discovery - Institut AIFB Tempus fugit Towards.
German Word Order explained!
Separable Verbs Turn to page R22 in your German One Book R22 is in the back of the book There are examples at the top of the page.
1 Stevens Direct Scaling Methods and the Uniqueness Problem: Empirical Evaluation of an Axiom fundamental to Interval Scale Level.
Adjective Endings Nominative & Accusative Cases describing auf deutsch The information contained in this document may not be duplicated or distributed.
Selectivity in the German Mobility Panel Tobias Kuhnimhof Institute for Transport Studies, University of Karlsruhe Paris, May 20th, 2005.
How Does Fuzzy Arithmetic Work ? © Hartwig Jeschke Institut für Mikroelektronische Schaltungen und Systeme Universität Hannover
EN/FAD Ericsson GmbH EDD/ Information im 21. Jahrundert muss Erwünscht Relevant Erreichbar Schnell Kostenlos!?
Technische Universität München 1 CADUI' June FUNDP Namur G B I The FUSE-System: an Integrated User Interface Design Environment Frank Lonczewski.
TUM in CrossGrid Role and Contribution Fakultät für Informatik der Technischen Universität München Informatik X: Rechnertechnik und Rechnerorganisation.
You need to use your mouse to see this presentation
THE CONVERSATIONAL PAST
FTS usage at GridKa Forschungszentrum Karlsruhe GmbH
Ferrite Material Modeling (1) : Kicker principle
 Präsentation transkript:

Mitglied der Helmholtz-Gemeinschaft Computation of Mutual Information Metric for Image Registration on Multiple GPUs Andrew V. Adinetz 1, Markus Axer 2, Marcel Huysegoms 2, Stefan Köhnen 2, Jiri Kraus 3, Dirk Pleiter JSC, Forschungszentrum Jülich 2 INM-1, Forschungszentrum Jülich 3 NVIDIA GmbH

Mitglied der Helmholtz-Gemeinschaft Brain Image Registration Multi-GPU Implementation system memory listupdate Performance Evaluation Conclusion Outline

Mitglied der Helmholtz-Gemeinschaft Preparation of the brain

Mitglied der Helmholtz-Gemeinschaft BigBrain – first high-resolution brain model at microscopical scale 7404 histological sections stained for cell bodies scanned with a flad bed scanner original resolution 10 × 10 × 20 μm 3 ( × pixels) downscaling to 20 μm isotropic removal of artifacts 1 Terabyte in cooperation with Alan Evans, McGill, Montreal Amunts et al. (2013) Science Pushing the limits for a cellular brain model

Mitglied der Helmholtz-Gemeinschaft The process of aligning images is called registration Image Registration ITK Workflow

Mitglied der Helmholtz-Gemeinschaft i, j – pixel values ( ) successful for multi-modal registration Mutual Information Metric

Mitglied der Helmholtz-Gemeinschaft main computational kernel transform can be complex (1000+ parameters) GPU implementation: 1 pixel/thread, atomics Two Image Cross-Histogram for(int y = 0; y < fixed_sz_y; y++) for(int x = 0; x < fixed_sz_x; x++) { int i = bin(fixed[x, y]); float x1 = transform_x(x, y); float y1 = transform_y(x, y); int j = bin(interpolate(moving, x1, y1)); histogram[i, j]++; // atomic on GPU }

Mitglied der Helmholtz-Gemeinschaft Large Data Size size: × px pixel size: 60 × 60 μm file size: 30 MB Large-area Polarimeter size: × px pixel size: 1.6 x 1.6 μm file size: 40 GB Polarizing Microscope

Mitglied der Helmholtz-Gemeinschaft Domain decomposition distribute fixed and moving images histogram contributions summed up Moving image: how to handle? irregular access pattern Approaches System memory replication (sysmem) Listupdate (listupdate) Multi-GPU Mutual Information

Mitglied der Helmholtz-Gemeinschaft Replicate entire moving image in pinned host RAM accessible to GPU +easy to implement –system memory accesses are slower –cannot use texture interpolation Optimizations moving image halo in GPU RAM System Memory Replication

Mitglied der Helmholtz-Gemeinschaft Processing buffer remote accesses exchange buffers compute contributions remotely +computation-communication overlap –hard to implement –chunk processing (or wont fit into buffer) Optimizations buffers: AoS vs. SoA, atomics vs. grouping using multiple streams Listupdate typedef struct { float[2] movingCoords; short destRank; char fixedBin; } message_t;

Mitglied der Helmholtz-Gemeinschaft Chunk Processing and Overlap Process chunk Group Exchang e Handle messages Process chunkGroup Exchang e Process chunkGroup 1 2 Fixed Image y x(0,0)

Mitglied der Helmholtz-Gemeinschaft atomics each writing thread increments atomic counter +simpler –atomics can be a bottleneck –one buffer per receiver required grouping each thread writes to fixed location buffers grouped before sending +single buffer, less memory +optimized grouping (shared-memory atomics, prefix sum) –more complicated (separate kernel required) Buffer Writeout: Atomics vs. Group

Mitglied der Helmholtz-Gemeinschaft Benchmark setup Fixed Image y Moving Image x(0,0) Remote access Mask

Mitglied der Helmholtz-Gemeinschaft JUDGE 256-node GPU cluster Each M2070 node: 2x M2070 (Fermi) GPU, each 6 GB RAM 12-core X GHz, 96 GB RAM JuHydra single-node Kepler machine 2x K20X (Kepler) GPU, each 6 GB RAM 16-core E GHz, 64 GB RAM Test Hardware

Mitglied der Helmholtz-Gemeinschaft Baseline: Full Replication (M2070) ideal scalability

Mitglied der Helmholtz-Gemeinschaft Sysmem on Fermi

Mitglied der Helmholtz-Gemeinschaft Sysmem on Fermi: Explanation No sysmem Access Good Coalescing Few sysmem Access Bad Coalescing Many sysmem Access Bad Coalescing Most sysmem Access Good Coalescing

Mitglied der Helmholtz-Gemeinschaft Sysmem on Fermi: PCI-E Queries

Mitglied der Helmholtz-Gemeinschaft Sysmem: Halo Sizes mostly quantitative, not qualitative difference

Mitglied der Helmholtz-Gemeinschaft Listupdate: Multiple Streams 4 streams look the best

Mitglied der Helmholtz-Gemeinschaft Listupdate: AoS vs SoA, Atomics vs Group SoA + atomics looks best

Mitglied der Helmholtz-Gemeinschaft Sysmem vs. Listupdate: Fermi on Fermi, sysmem is better

Mitglied der Helmholtz-Gemeinschaft Sysmem vs. Listupdate: Kepler (Closeup) on Kepler, listupdate is better

Mitglied der Helmholtz-Gemeinschaft Fermi performance limited by atomics system memory replication is better Kepler order of magnitude faster than Fermi no longer dominated by atomics listupdate (atomic, SoA, 4 streams) is better Future work Compression Trials on real images Conclusions

Mitglied der Helmholtz-Gemeinschaft INM-1 at FZJ: 1/EN/Home/home_node.htmlhttp:// 1/EN/Home/home_node.html NVidia Application Lab at FZJ: juelich.de/ias/jsc/nvlabhttp:// juelich.de/ias/jsc/nvlab Andrew V. Adinetz: Jiri Kraus: Dirk Pleiter: Questions ?