Primary Data for Chemistry

Slides:



Advertisements
Ähnliche Präsentationen
Zukunftsaussichten DOI Registrierung Jan Brase, DataCite 3. November 2011 TIB-Workshop zur DOI-Registrierung Hannover.
Advertisements

H - A - M - L - E - IC T Teachers Acting Patterns while Teaching with New Media in the Subjects German, Mathematics and Computer Science Prof. S. Blömeke,
DataCite Jan Brase, TIB & DataCite 3. November 2011 TIB-Workshop zur DOI-Registrierung Hannover.
Verbs Used Impersonally With Dative Deutsch I/II Fr. Spampinato.
Sprechen You have 5 min to prepare and study your notes. Then you will speak for 30 sec about yourself. Include as much information as you can. Some questions.
DEUTSCHLAND UND DIE MEDIEN
ScienceAndTechnologyWissenschaftUndTechnikScienceAndTechnologyWissenschaftUndTechnik.
4th Symposium on Lidar Atmospheric Applications
DOAS-Gruppe: Institut für Umweltphysik Universität Bremen Promotionsvortrag, 28. April 2006,
1 von 7 ViS:AT BMUKK, IT – Systeme für Unterrichtszwecke 05/11 EZ, CR Social Networks – Soziale Netzwerke Virtuelle Science Cafes & Diskussionsforen für.
Titelmasterformat durch Klicken bearbeiten Textmasterformate durch Klicken bearbeiten Zweite Ebene Dritte Ebene Vierte Ebene Fünfte Ebene 1 Titelmasterformat.
Selectivity in the German Mobility Panel Tobias Kuhnimhof Institute for Transport Studies, University of Karlsruhe Paris, May 20th, 2005.
Prüfungen neu. Änderungen Probesystem 4 Prüfungen pro Schuljahr Jeweils ganze Lektion, keine Fragemöglichkeit am Anfang der Prüfungslektion Taschenrechner.
GERMAN 1013 C1 Guten Morgen.
23 von 23 Olaf Eigenbrodt 10/08 Standardisation Standards are reliable basics for library design and construction IFLA Library Building Guidelines German.
Research-guided Teaching Representation in the Biology Curriculum.
Filmtechnische Geräte in dem Film Farben (colours) Humor (humour) Musik (music) Symbole und Leitmotive (symbols and themes)
Research-guided Teaching Representation in the Biology Curriculum.
1 Public libraries’ perspectives in the digital media world Doom or bloom: reinventing the library in the digital age 29th October 2009 Christian Hasiewicz.
by VERLAG DER ÖSTERREICHISCHEN AKADEMIE DER WISSENSCHAFTEN EPUB.OEAW Repository of the Austrian Academy of.
Review of Verb Tense & Expressing Opinions
1IWF/ÖAW GRAZ Data Combination David Fischer, Rumi Nakamura (IWF/OeAW)  Fluxgate: noise + distortion gets worse than the searchcoil at ~ 6 Hz.  Searchcoil:
Die Fragen Wörter Wer? Was? Wann?.
SiPass standalone.
Stephanie Müller, Rechtswissenschaftliches Institut, Universität Zürich, Rämistrasse 74/17, 8001 Zürich, Criminal liability.
Literary Machines, zusammengestellt für ::COLLABOR:: von H. Mittendorfer Literary MACHINES 1980 bis 1987, by Theodor Holm NELSON ISBN
Akkusativ Präpositionen
FTS usage at GridKa Forschungszentrum Karlsruhe GmbH
E STUNDE Deutsch AP. Freitag, der 19. April 2013 Deutsch AP (E Stunde)Heute ist ein G Tag Goal: to understand authentic written text, audio material and.
Software and Searchengines
Wissenschaftliche Kommunikations-Infrastruktur Workshop CIDOC CRM SIG Meeting Germanisches Nationalmuseum, Nürnberg May 19, 2015 Mark Fichtner Germanisches.
DEUTSCHE INITIATIVE FÜR NETZWERKINFORMATION E.V. Regine Tobias & Dr. Barbara Ebert CRIS in German universities and research institutions - position paper.
A Quick Review. What are the four prepositions used most commonly in this chapter to talk about vacations. an auf in nach.
Physik multimedial Lehr- und Lernmodule für das Studium der Physik als Nebenfach Julika Mimkes: Links to e-learning content for.
KLIMA SUCHT SCHUTZ EINE KAMPAGNE GEFÖRDERT VOM BUNDESUMWELTMINISTERIUM Co2 online.
Gregor Graf Oracle Portal (Part of the Oracle Application Server 9i) Gregor Graf (2001,2002)
Seminar Digitale Bibliotheken Thema: Nutzen von digitalen Bibliotheken.
Kapitel 4: Mein Tag Sprache.
EUROPÄISCHE GEMEINSCHAFT Europäischer Sozialfonds EUROPÄISCHE GEMEINSCHAFT Europäischer Fonds für Regionale Entwicklung Workpackage 5 – guidelines Tasks.
Fakultät für Gesundheitswissenschaften Gesundheitsökonomie und Gesundheitsmanagement Universität Bielefeld WP 3.1 and WP 4.1: Macrocost.
Kapitel 2 Grammar INDEX 1.Subjects & Verbs 2.Conjugation of Verbs 3.Subject Verb Agreement 4.Person and Number 5.Present Tense 6.Word Order: Position of.
EUROPÄISCHE GEMEINSCHAFT Europäischer Sozialfonds EUROPÄISCHE GEMEINSCHAFT Europäischer Fonds für Regionale Entwicklung Workpackage 5 – guidelines Tasks.
10.3 Lektion 10 Geschichte und Gesellschaft STRUKTUREN © and ® 2012 Vista Higher Learning, Inc Der Konjunktiv I and indirect speech —Ich komme.
Caroline Euringer Hamburg University LEO.-App: Mobile phone application for self-testing in reading and writing Peer Learning Activity on the use of digital.
The Chicago Style (2 e-publication) -1- The Practice of Citing Scientific Sources in the Chicago Style Part 2 - Electronic sources -
Kapitel 9 Grammar INDEX 1.Formal Sie- Command 2.There Is/There Are 3.Negation: Nicht/Klein.
Die toten hosen German punk rock band since thirty years With many well known hits.
On the case of German has 4 cases NOMINATIVE ACCUSATIVE GENITIVE DATIVE.
Teck-Realschule, Kirchheim, GERMANY At TECK-REALSCHULE COMENIUS meeting in Kirchheim 10 – 16 May 2013 No risk but a lot of fun!!! “No risk – no fun?” “Ohne.
Hochschule Anhalt FB Informatik und Sprache Master Informationsmanagement Dayun Xu Slide 1 Adjustment of ECG Signals by using linear geometric transformation.
CASES the full set including der die das ein eine ein pronouns
The Umlaut “Ü” in German: Exercises Based on chapter 5.8 of Rundblick 2 and online Speaking Practice Chapter 7.8 and 9.9.
LLP DE-COMENIUS-CMP Dieses Projekt wurde mit Unterstützung der Europäischen Kommission finanziert. Die Verantwortung für den Inhalt dieser.
Dom zu Lübeck The Lübeck Cathedral (German: Dom zu Lübeck, or colloquially Lübecker Dom) is a large brick Lutheran cathedral in Lübeck, Germany and part.
Van der Meer AJ, Feld JJ, Hofer H J. Hepatol Oct 22
Freizeit Thema 5 Kapitel 1 (1)
Deutsch I Telling time….
The Umlaut “Ö” in German: Exercises
Scientific Reasoning in Medical Education
Process and Impact of Re-Inspection in NRW
Synonyms are two or more words belonging to the same part of speech and possessing one or more identical or nearly identical denotational meanings, interchangeable.
IT QM Part2 Lecture 7 PSE GSC
Dual Master Programm mit KAIST
eSciDoc als Plattform für die Wissenschaft Anwendungen und Szenarien
„Förderwolke“ A Cloud-based exchange platform for the qualitative enhancement and improvement of inclusive education Dipl. Reha-Päd. Hanna Linke scientific.
Official Statistics Web Cartography in Germany − Regional Statistics, Federal and European Elections, Future Activities − Joint Working Party meeting.
Practical Exercises and Theory
Calorimetry as an efficiency factor for biogas plants?
- moodle – a internet based learning platform
 Präsentation transkript:

Primary Data for Chemistry DataCite Summer Meeting 2010 Making Datasets Visible and Accessible June 7/8, 2010 Primary Data for Chemistry Susanne Haak,1 Guido F. Herrmann,1 Irina Sens,2 Jan Brase2 1 Georg Thieme Verlag KG Ruedigerstrasse 14, 70469 Stuttgart, Germany www.thieme-chemistry.com 2 German National Library of Science and Technology (TIB) Welfengarten 1B, 30167 Hannover, Germany http://www.tib-hannover.de; http://www.datacite.org

PARTNERS BACKGROUND PROCESS RESULTS SUMMARY PARTNERS BACKGROUND

PARTNERS PARTNERS BACKGROUND PROCESS RESULTS SUMMARY

PARTNERS TIB is the largest scientific library in the world Architecture, Chemistry, Computer Science, Mathematics, Physics, Engineering technology Financed by Federal Government and all Federal States € 8 Mio. annual acquisition budget 18,500 journal subscriptions 7,0 Mio. items Global Supplier for scientific and technical information of all types – text, numeric data, audio, video, etc. Global consortium carried by local institutions focused on improving the scholarly infrastructure around datasets and other non-textual information focused on working with data centers and organizations that hold data Providing standards, workflows and best-practice Initially, but not exclusively based on the DOI system Founded December 1st 2009 in London

PARTNERS Thieme Chemistry Part of the Thieme publishing group, based in Stuttgart (Germany) Publishes highly evaluated information about synthetic and general chemistry for professional chemists and advanced students since 1909.

PARTNERS This is one of the – far too few – intense co-operations between libraries and publishers. Our Journals always have been at the forefront of innovation and we are proud that once again, we can lead the way.

BACKGROUND PARTNERS BACKGROUND PROCESS RESULTS SUMMARY

As a result, datasets are BACKGROUND Gap in the scientific record between published research and the underlying data Published work held by publishers and libraries Datasets held by data centers No effective way to link between datasets and articles No widely used method to identify datasets No widely used method to cite datasets As a result, datasets are Difficult to discover Difficult to access

In chemistry research data are created: BACKGROUND In chemistry research data are created: Using the vast array of chromatographic methods (GC, HPLC etc.) Employing spectroscopic methods (NMR, MS, UV/VIS, IR, X-ray etc.) As a result of theoretical calculations (quantum mechanics, simulation of spectra etc.) Or by using the various high-throughput technologies in medicinal chemistry Typical research data are created: Using the vast array of analytical techniques (GC, HPLC etc.) Employing spectroscopic methods (NMR, MS, UV/VIS, IR, X-Ray etc.) Crystallographic Data Source: www.perkin-elmer.com

Primary Data in Organic Chemistry BACKGROUND Estimation: Primary Data in Organic Chemistry 500.000 to 1.000.000 Datasets Per Year Columbus, Ohio (September 8, 2009) - Chemical Abstracts Service (CAS), a division of the American Chemical Society, announced that on September 7 it recorded the 50 millionth substance in CAS REGISTRYSM, the world's most comprehensive and high-quality compendium of publicly disclosed chemical information. The recently registered substance is a novel arylmethylidene heterocycle with analgesic properties. Reaching the 50 million mark so quickly is an indicator of the accelerating pace of scientific knowledge. CAS registered the 40 millionth substance just nine months ago - in contrast, it took 33 years for CAS to register the 10 millionth compound in 1990. From: Neudert, Reinhard - Weinheim [mailto:rneudert@wiley.com] Sent: Mittwoch, 3. März 2010 16:31 To: Krimmer, Dr. Thomas Subject: AW: Zahl der Spektren in der (organischen) Chemie Hallo Herr Krimmer, ich habe die Zahlen gefunden. Diese beruhen auf vier starken Chemiejournalen von Wiley-VCH. Wollte man alle erfassen, müsste man hochrechnen. Da Wiley-VCH eine sehr starke Chemie hat wird der Faktor wohl um 3 herum liegen, also 15 Millionen Spektren in den letzten 25 Jahren Gruß Neudert Primärdaten in der Chemie In der chemischen Forschung fallen täglich große Menge Primärdaten an, die im akademischen Umfeld letztendlich in eine wissenschaftliche Publikation münden. So haben die größten Chemie Journale des Wiley-VCH Verlages im Jahr 2006 die folgende Anzahl von Artikel veröffentlicht: •           Angewandte Chemie : ca. 1.700 Beiträge •           Chemistry A European Journal: ca.1000 Beiträge •           EurJIC : ca. 610 Beiträge •           EurJOC: ca. 650 Beiträge           In den insgesamt ca. 4590 Beiträgen werden im Mittel etwa 40 Spektren pro Artikel beschrieben. Diese auf Primärdaten beruhenden Spektren sind meist in den experimentellen Sektionen in stark reduzierter Form gelistet, werden aber auch häufig in Form von Abbildungen im Zusatzmaterial („Supporting Information“) gezeigt. In beiden Fällen sind die Daten nicht elektronisch recherchierbar, noch kann auf die ursprünglichen Messdaten in irgendeiner Form zugegriffen werden. Geht man von der oben genannten Spektrenzahl aus, dann sind in den letzten 25 Jahren im Zusammenhang mit dem Publikationsprozess nicht weniger als 5 Millionen Spektren, also 5 Millionen Primärdatensätze erzeugt worden. Number of spectra in the last year in 50 organic chemistry journals with 500 articles each with 80 spectra per article: 2,000,000; To put this number into perspective: MedLine currently contains some 40 Mio abstracts - in total and adds less than 1 Mio a year; CAS currently contains some 42 Mio substances – again in total, adding roughly 1 Mio per year! The 2 Mio here refer to organic chemistry and 1 single year only! Thus far, the vast amount of data lies scattered on the computers of scientists, who have produced the information. As no central repository exists, no accessible archival storage is possible at the moment. Due to the missing credit that working up such data currently receives, primary data is often poorly documented, difficult to access and not saved for the long term. Researchers are retracting a highly-cited 2004 Science paper describing a new way of adding sugars to proteins -- a longstanding challenge in molecular biology -- citing their inability to repeat the results and the absence of the original lab notebooks with the experiment details, they announced in Science last Thursday (November 26, 2009). source: www.perkin-elmer.com

PROCESS PARTNERS BACKGROUND PROCESS RESULTS SUMMARY

What is needed: Servers/Data Centers Metadata DOI PROCESS What is needed: Servers/Data Centers Metadata DOI Creation of new and strengthening of existing data centers. Responsible for: Quality assurance Storage of the content and accessibility Creation of metadata Global access to data sets and their metadata through existing catalogues. TIB stores the metadata and keeps it searchable. Use of persistent identifiers – also for data (DOI = Digital Object Identifier) TIB registers research data worldwide from a scientific, technical or medical background The Digital Object Identifier (DOI®) System is for identifying content objects in the digital environment. Information about a digital object may change over time, including where to find it, but its DOI name will not change. The DOI System provides a framework for persistent identification. The system is managed by the International DOI Foundation Over 40 million DOI names have been assigned by DOI System Registration Agencies in the US, Australia, and Europe. You might have come across this when citing advanced online articles. Digital Object Identifier (DOI) http://dx.doi.org/10.1055/s-2008-1067226

Thieme hosts the research data in a data center (FIZ Karlsruhe). PROCESS At the same time with the article the author submits the research data to Thieme. Thieme hosts the research data in a data center (FIZ Karlsruhe). TIB assigns a DOI to the data. At the same time the article is published the primary data are published as independent entity but in connection with the article. The article quotes the research data as reference items with the assigned DOI.

RESULTS PARTNERS BACKGROUND PROCESS RESULTS SUMMARY

RESULTS An abstract with primary data as supplementary information. Primary Data has its own DOI, different from the one of the paper – thus, PD can be cited independently. Clicking the link (or entering the DOI in a web browser) downloads a zip file.

RESULTS Primary data come neatly organized in a zip file. Numbering of the folders corresponds to numbering of the compounds in the corresponding article. The folder also contains a Read me file.

RESULTS The Read Me PDF in the zip-File describes the content and which programs can be used to view it

RESULTS From the Article: Carbon (top left) Proton (top right) COSY (bottom left) From Bruker: MA (bottom right) No PDFs or JPGs – these are actual raw & interactive data – which you can load into your system, zoom in, overlay with your own measurements, etc.

SUMMARY PARTNERS BACKGROUND PROCESS RESULTS SUMMARY

Benefits Citability of research data High visibility of the data SUMMARY Benefits Citability of research data High visibility of the data Easy re-use and verification of the data sets Avoiding duplications Motivation for new research

Benefits for authors More work Proof of quality SUMMARY Benefits for authors More work Proof of quality Documentation of validity More exposure for results First, it looks just like another burden. Not only do the original data really show how clean the products were, they also add great value and trust to the used methods. In addition, the work gets a good deal more exposure, as it will show up not only when someone looks for an article, but also for a substance, a method, a spectrum, etc. Imagine what that can mean for the rating of an author [h-factor (if the methodology is good, of course)]

Benefits for users Quick evaluation of papers SUMMARY Benefits for users Quick evaluation of papers Find structures by spectra Find similar patterns Understand individual peaks But fast forward a few years with me Most articles come with primary data This data itself is fully searchable Spectra are linked via InChIs to the structures Users will be able to search for patterns, or even single peaks!

Open questions No specific regulations so far Copyright SUMMARY Open questions No specific regulations so far Copyright Centralized data hosting Data compatibility To realize all this, there are some burdens to be crossed: Centralized data hosting with clear definitions for requirements needed. see Pangaea – Publishing Network for Geoscientific & Environmental Data hosted by the Alfred Wegener Institute for Polar and Marine Research (Bremerhaven) and the Center for Marine Environmental Sciences (Bremen) Supported by The European Commission, Research Federal Ministry of Education and Research (BMBF) Deutsche Forschungsgemeinschaft (DFG) International Ocean Drilling Program (IODP) The information system PANGAEA is operated as an Open Access library aimed at archiving, publishing and distributing georeferenced data from earth system research. The system guarantees long-term availability of its content through a commitment of the operating institutions. No regulations regarding format, copyright, use of data, definition of data as primary, etc. The project presented here is a start-up prototype and currently not more which is also the reason why the issues I just mentioned have not been fully addresses and solved yet. Data quality This isn’t meaning the scientific quality of the data but their technical characteristics, compatibility with different “hardware” (e.g. currently two main suppliers of NMR spectrometers, data are only cross-readable into one direction).

SUMMARY So, the only thing I have left to say: go, share your primary chemical data with your fellow researchers – now. Details on how-to do it can be found in our instructions for authors. If you happen to publish not only in SYNLETT and SYNTHESIS, please talk to your editor about primary data – they might already work on it.

Thank You!