Corpus-based Error Detection in a Multilingual Medical Thesaurus

Slides:

Advertisements

Ähnliche Präsentationen

Primary Data for Chemistry

Advertisements

Art der Arbeit (Projekt-/Studien-/Diplomarbeit/

You need to use your mouse to see this presentation © Heidi Behrens.

INTAKT- Interkulturelle Berufsfelderkundungen als ausbildungsbezogene Lerneinheiten in berufsqualifizierenden Auslandspraktika DE/10/LLP-LdV/TOI/

Welcome Instructor: Dominik Dwight Zethmeier

Deutsch Zwei Guten Tag! Heute ist FREITAG!!!!!! Die Sinnfrage: Wie fühlst du dich?? Die Ziele: You will discuss what you do/dont.

Universität StuttgartInstitut für Wasserbau, Lehrstuhl für Hydrologie und Geohydrologie Copulas (1) András Bárdossy IWS Universität Stuttgart.

Titelmasterformat durch Klicken bearbeiten Textmasterformate durch Klicken bearbeiten Zweite Ebene Dritte Ebene Vierte Ebene Fünfte Ebene 1 Rising energy.

Titelmasterformat durch Klicken bearbeiten Textmasterformate durch Klicken bearbeiten Zweite Ebene Dritte Ebene Vierte Ebene Fünfte Ebene 1 Titelmasterformat.

Hast du einen Nebenjob?.

KIT – die Kooperation von Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH) The dependence of convection-related parameters on surface and.

Institut für Angewandte Mikroelektronik und Datentechnik Phase 5 Architectural impact on ASIC and FPGA Nils Büscher Selected Topics in VLSI Design (Module.

Lust auf Lesen Treffpunkt Deutsch Sixth Edition. Relative Pronoun object of a preposition Recall from chapter 9 that relative clauses describe people,

1IWF/ÖAW GRAZ Data Combination David Fischer, Rumi Nakamura (IWF/OeAW)  Fluxgate: noise + distortion gets worse than the searchcoil at ~ 6 Hz.  Searchcoil:

Listening Comprehension Kapitel 8 (Lehrbuch KONTAKTE) Thema: Essen und Einkaufen Level: 2. Semester Deutsch at University Level.

Stephanie Müller, Rechtswissenschaftliches Institut, Universität Zürich, Rämistrasse 74/17, 8001 Zürich, Criminal liability.

E STUNDE Deutsch AP. Mittwoch, der 24. April 2013 Deutsch AP (E Stunde)Heute ist ein C Tag Goal: to understand authentic written text, audio material.

E STUNDE Deutsch AP. Freitag, der 19. April 2013 Deutsch AP (E Stunde)Heute ist ein G Tag Goal: to understand authentic written text, audio material and.

E STUNDE Deutsch AP. Donnerstag, der 9. Mai 2013 Deutsch AP (E Stunde)Heute ist ein G Tag Goal: to understand authentic written text, audio material and.

Pierre Auger Observatory. Pierre Auger( ) Was a nuclear physics and cosmic ray physics. Made cosmic ray experiments on the Jungfraujoch Discovery.

Titelmasterformat durch Klicken bearbeiten Textmasterformate durch Klicken bearbeiten Zweite Ebene Dritte Ebene Vierte Ebene Fünfte Ebene 1 Rising energy.

Physik multimedial Lehr- und Lernmodule für das Studium der Physik als Nebenfach Julika Mimkes: Links to e-learning content for.

E STUNDE Deutsch AP. Montag, der 8. April 2013 Deutsch AP (E Stunde)Heute ist ein E Tag Goal: to understand authentic written text, audio material and.

How does the Summer Party of the LMU work? - Organizations and Networks -

E STUNDE Deutsch AP. Dienstag, der 16. April 2013 Deutsch AP (E Stunde)Heute ist ein D Tag Goal: to understand authentic written text, audio material.

E STUNDE Deutsch AP. Dienstag, der 23. April 2013 Deutsch AP (E Stunde)Heute ist ein B Tag Goal: to understand authentic written text, audio material.

E STUNDE Deutsch AP. Donnerstag, der 25. April 2013 Deutsch AP (E Stunde)Heute ist ein D Tag Goal: to understand authentic written text, audio material.

E STUNDE Deutsch AP. Donnerstag, der 2. Mai 2013 Deutsch AP (E Stunde)Heute ist ein B Tag Goal: to understand authentic written text, audio material and.

E STUNDE Deutsch AP. Dienstag, der 28. Mai 2013 Deutsch AP (E Stunde)Heute ist ein E Tag Goal: to understand authentic written text, audio material and.

KLIMA SUCHT SCHUTZ EINE KAMPAGNE GEFÖRDERT VOM BUNDESUMWELTMINISTERIUM Co2 online.

Montag den 8. Juni Lernziel:- To launch a project and receive results.

Holiday destinations, language holidays and informed languages in the EU Lea Kern.

1/15 Thursday, 21 June 2007 Werner Sudendorf, Jürgen Keiper Deutsche Kinemathek – Museum für Film und Fernsehen Werner Sudendorf, Jürgen Keiper Reconstructing.

EUROPÄISCHE GEMEINSCHAFT Europäischer Sozialfonds EUROPÄISCHE GEMEINSCHAFT Europäischer Fonds für Regionale Entwicklung Workpackage 5 – guidelines Tasks.

Fakultät für Gesundheitswissenschaften Gesundheitsökonomie und Gesundheitsmanagement Universität Bielefeld WP 3.1 and WP 4.1: Macrocost.

Kapitel 2 Grammar INDEX 1.Subjects & Verbs 2.Conjugation of Verbs 3.Subject Verb Agreement 4.Person and Number 5.Present Tense 6.Word Order: Position of.

Memorisation techniques

Kapitel 8 Grammar INDEX 1.Command Forms: The Du-Command Form & Ihr- Command 2.Sentences & Clauses.

E STUNDE Deutsch AP. Donnerstag, der 30. Mai 2013 Deutsch AP (E Stunde)Heute ist ein G Tag Goal: to understand authentic written text, audio material.

EUROPÄISCHE GEMEINSCHAFT Europäischer Sozialfonds EUROPÄISCHE GEMEINSCHAFT Europäischer Fonds für Regionale Entwicklung Workpackage 5 – guidelines Tasks.

E STUNDE Deutsch AP. Donnerstag, der 11. April 2013 Deutsch AP (E Stunde)Heute ist ein A Tag Goal: to understand authentic written text, audio material.

German University Application Basics Refugee Coaching (RC) 1RC Next steps Next steps ………

G Stunde DEUTSCH 1.  Unit: Family & homeFamilie & Zuhause  Objectives:  Phrases about date, weather and time-telling  Family and family relations.

Essay structure Example: Die fetten Jahre sind vorbei: Was passiert auf der Almhütte? Welche Bedeutung hat sie für jede der vier Personen? Intro: One or.

What’s the weather like?. Look at the question above Turn it around and you have Das Wetter ist.... The phrase Das Wetter ist.... or Es ist.... can be.

Custom error page for timeout Gergely Andó / Application Innovation July 10, 2013 Customer.

Kanton Basel-Stadt Howto crash a sequencer …and a path to get a difficult package to work APP-V Swissgroup / Daniel Müller.

Interrogatives and Verbs

Sentence Structure Questions

Freizeit Thema 5 Kapitel 1 (1)

Scientific Reasoning in Medical Education

Sentence Structure Connectives

Jetzt machen Venues aufmachen!!! Geh zu

The dynamic ultrasound

Jetzt machen Venues aufmachen!!! Geh zu

Process and Impact of Re-Inspection in NRW

Synonyms are two or more words belonging to the same part of speech and possessing one or more identical or nearly identical denotational meanings, interchangeable.

Students have revised SEIN and HABEN for homework

IT QM Part2 Lecture 7 PSE GSC

Wohin bist du gegangen? Where did you go?

THE PERFECT TENSE IN GERMAN

Hallo! Wie geht’s? Hallo! Mir geht’s gut, danke! Guten Tag!

Integrating Knowledge Discovery into Knowledge Management

Calorimetry as an efficiency factor for biogas plants?

Niedersächsisches Ministerium

School supplies.

- moodle – a internet based learning platform

Zhunussova G., AA 81. Linguistic communication, i.e. the use of language, is characteristically vocal and verbal behaviour, involving the use of discrete.

Präsentation transkript:

Corpus-based Error Detection in a Multilingual Medical Thesaurus Roosewelt L. Andradea,b, Edson Pachecoa,b, Pindaro S. Canciana,b, Percy Nohamaa,b, Stefan Schulzb,c a Paraná University of Technology (UTFPR), Curitiba, Brazil b Pontificial Catholic University of Paraná, Master Program of Health Technology, Curitiba, Brazil c University Medical Center Freiburg, Medical Informatics, Freiburg, Germany Dieser Vortrag behandelt unterschiedliche Methoden zur sprachübergreifenden Dokumentenrecherche in der Welt der Medizin. Medizinische Texte unterscheiden sich von anderen Textsammlungen. Sie sind in der Regel extrem groß und dynamisch – man denke beispielsweise an Dokumente, die im täglichen Klinikbetrieb neu entstehen. Darüberhinaus sind medizinische Textkollektionen heterogen und mehrsprachig: Während klinische Texte wie Pathologieberichte oder Arztbriefe in der Regel in einer jeweiligen Muttersprache verfasst werden, sind wissenschaftliche Beiträge fast ausschliesslich in Englisch anzutreffen. Ebenso wie die Texte, íst auch die Nutzergemeinde medizinischer Informationen höchst unterschiedlich: KLICK

Introduction Methods Results Discussion Conclusion

Thesaurus Controlled Vocabulary for document indexing and retrieval Introduction Methods Results Discussion Conclusion Thesaurus Controlled Vocabulary for document indexing and retrieval Assigns semantic descriptors (concepts) to (quasi-)synonymous terms Contains additional semantic relations (e.g. hyperonym / hyponym) Examples: MeSH, UMLS, WordNet Multilingual thesaurus: contains translations (cross-language synonymy links)

Multilingual Thesaurus Management Introduction Methods Results Discussion Conclusion Multilingual Thesaurus Management International team of curators React to new terms and senses Decide which terms are synonymous / translations Decide which senses of a term have to be accounted for in the domain Requires quality assurance measures

Case study: Morphosaurus Introduction Methods Results Discussion Conclusion Case study: Morphosaurus Medical subword thesaurus Organizes subwords (meaningful word fragments) in multilingual equivalence classes: #derma = { derm, cutis, skin, haut, kutis, pele, cutis, piel, … } #inflamm = { inflamm, -itic, -itis, phlog, entzuend, -itis, -itisch, inflam, flog, inflam, flog, ... } Maintained at two locations: Freiburg (Germany), Curitiba (Brazil) Lexicon curators: frequently changing team of medical students

Morphosaurus Structure Introduction Methods Results Discussion Conclusion Morphosaurus Structure Thesaurus: ~21.000 equivalence classes (MIDs) Lexicon entries: English: ~23.000 German: ~24.000 Portuguese: ~15.000 Spanish : ~11.000 French: ~ 8.000 Swedish: ~10.000 muscle myo muskel muscul inflamm - itis inflam entzünd Eq Class subword herz heart card corazon INFLAMM MUSCLE HEART Segmentation: Myo|kard|itis Herz|muskel|entzünd|ung Inflamm|ation of the heart muscle Indexation: #muscle #heart #inflamm #heart #muscle #inflamm #inflamm #heart #muscle

Morphosemantic Normalization Introduction Methods Results Discussion Conclusion Morphosemantic Normalization

Additional Challenges for Morphosaurus Introduction Methods Results Discussion Conclusion Additional Challenges for Morphosaurus Properly delimit subword entries so that they are correctly extracted from complex words Create consensus about the scope of synonymy classes, especially with regard to highly ambiguous lexicon entries

Introduction Methods Results Discussion Conclusion

Morphosaurus Quality Assurance Introduction Methods Results Discussion Conclusion Morphosaurus Quality Assurance Content quality: Identify content errors in the thesaurus content Implement a quality process to fix these errors Show positive impact Process quality: Detect and prevent user action anomalies actions that consume effort without any positive impact : uncoordinated edit / update / delete “do undo” transactions done by different people) see Paper Bitencourt et al., Session 089

Testbed: Parallel Medical Corpora Introduction Methods Results Discussion Conclusion Testbed: Parallel Medical Corpora Apply morphosemantic indexing to the Merck manual in English, Spanish, Portuguese, German Hypothesis: nearly identical frequency distribution of Morphosaurus identifiers (MIDs) for each language Problematic MIDs can be spotted by comparing MID frequencies for each language pair

Testbed: Parallel Medical Corpora Introduction Methods Results Discussion Conclusion Testbed: Parallel Medical Corpora

Scoring of MID Imbalance Introduction Methods Results Discussion Conclusion Scoring of MID Imbalance frequencies of MIDs in language 1 and 2 degree of imbalance mean relative frequency Score used for MID ranking per language pair

Introduction Methods Results Discussion Conclusion

For each l1,l2: MIDs ranked by Score Introduction Methods Results Discussion Conclusion For each l1,l2: MIDs ranked by Score

Experimental Modification of Workflow Introduction Methods Results Discussion Conclusion Experimental Modification of Workflow Discussion of 100 most highly ranked MID imbalances for each language pair Classification of Problems Documentation Correction in consensus

Reason for MID high score Introduction Methods Results Discussion Conclusion Problems identified Reason for MID high score Portuguese / English German / English Spanish / English Ambiguous lexemes 0.23 0.38 0.14 Missing or dispensable MID 0.49 0.18 0.53 Same Sense in Different MIDs 0.06 0.12 0.19 One MID with Different Senses 0.04 0.05 No problem 0.11 0.10 Unclassified problem 0.07 0.17

Modified Workflow Discussion of highly ranked MID imbalances Introduction Methods Results Discussion Conclusion Modified Workflow Discussion of highly ranked MID imbalances

Introduction Methods Results Discussion Conclusion Summative Evaluation Did the targeted error spotting and resolution have an impact on the “in vivo” performance of the thesaurus ? Benchmark: OHSUMED document collection (user queries with relevant MEDLINE abstract assigned), queries translated to German, Spanish, Portuguese Target: monitoring of Timelines of Eleven Point Average measure (precision values at recall 0, 0.1, 0.2,…0.9, 1.0)

Summative Evaluation Results Introduction Methods Results Discussion Conclusion Summative Evaluation Results Eleven- Point- Average Precision English Portuguese German Spanish OHSUMED benchmark Weeks

Introduction Methods Results Discussion Conclusion

Introduction Methods Results Discussion Conclusion Discussion of Results Most problems detected corresponded to real errors (approx. 90%) Consensus could be found in most cases “In vivo” evaluation showed a clear increase in IR performance for Spanish, at the time the less consolidated language in Morphosaurus

Introduction Methods Results Discussion Conclusion

Conclusions / Recommendations Introduction Methods Results Discussion Conclusion Conclusions / Recommendations Error detection by MID imbalance score proved useful New workflow productive for error elimination Recommendations Record MID score over time avg for each language pair avg for each language for each MID Generate alerts for every MIDs which exhibits an increase in imbalance above a tolerance interval Continue monitoring AvgP11 values Include into Morphosaurus editing environment

Medical Thesaurus Anomaly Detection by User Action Monitoring Session 089 Jeferson L. Bitencourt a,b, Pindaro S. Canciana,b, Edson Pachecoa,b, Percy Nohamaa,b, Stefan Schulzb,c a Paraná University of Technology (UTFPR), Curitiba, Brazil b Pontificial Catholic University of Paraná, Master Program of Health Technology, Curitiba, Brazil c University Medical Center Freiburg, Medical Informatics, Freiburg, Germany Dieser Vortrag behandelt unterschiedliche Methoden zur sprachübergreifenden Dokumentenrecherche in der Welt der Medizin. Medizinische Texte unterscheiden sich von anderen Textsammlungen. Sie sind in der Regel extrem groß und dynamisch – man denke beispielsweise an Dokumente, die im täglichen Klinikbetrieb neu entstehen. Darüberhinaus sind medizinische Textkollektionen heterogen und mehrsprachig: Während klinische Texte wie Pathologieberichte oder Arztbriefe in der Regel in einer jeweiligen Muttersprache verfasst werden, sind wissenschaftliche Beiträge fast ausschliesslich in Englisch anzutreffen. Ebenso wie die Texte, íst auch die Nutzergemeinde medizinischer Informationen höchst unterschiedlich: KLICK