Corpus based Creation and Extension of Domain-Specific Resources Manuela Kunze, Dietmar Rösner University of Magdeburg.

Slides:



Advertisements
Ähnliche Präsentationen
Cadastre for the 21st Century – The German Way
Advertisements

Finding the Pattern You Need: The Design Pattern Intent Ontology
E-Solutions mySchoeller.com for Felix Schoeller Imaging
H - A - M - L - E - IC T Teachers Acting Patterns while Teaching with New Media in the Subjects German, Mathematics and Computer Science Prof. S. Blömeke,
P R O B e r u f ProBeruf e.V. Angelika Bühler Arbeitstreffen EP-EvaluatorInnen und der Programm-Evaluation EQUAL, 15. / 16. Dez. 2004, Berlin Mehrwert.
R. Zankl – Ch. Oelschlegel – M. Schüler – M. Karg – H. Obermayer R. Gottanka – F. Rösch – P. Keidler – A. Spangler th Expert Meeting Business.
The difference between kein and nicht.
1 | R. Steinbrecher | IMK-IFU | KIT – die Kooperation von Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH) Natural Sources SNAP11.
Multi electron atoms Atoms with Z>1 contain >1 electron. This changes the atomic structure considerably because in addition to the electron-nucleus interaction,
Fakultät für informatik informatik 12 technische universität dortmund Mapping of Applications to Platforms Peter Marwedel TU Dortmund, Informatik 12 Germany.
Peter Marwedel TU Dortmund, Informatik 12
Ralf Schenkel joint work with Fabian Suchanek and Gjergji Kasneci YAWN A Semantically Annotated Wikipedia XML Corpus.
NUMEX – Numerical experiments for the GME Fachhochschule Bonn-Rhein-Sieg Wolfgang Joppich PFTOOL - Precipitation forecast toolbox Semi-Lagrangian Mass-Integrating.
Insulin pump therapy in adults allows metabolic control at lower rates of hypoglycemia along with reduced insulin doses – results from the nationwide DPV-survey.
Lancing: What is the future? Lutz Heinemann Profil Institute for Clinical Research, San Diego, US Profil Institut für Stoffwechselforschung, Neuss Science.
Thomas Herrmann Software - Ergonomie bei interaktiven Medien Step 6: Ein/ Ausgabe Instrumente (Device-based controls) Trackball. Joystick.
Introduction to the topic. Goals: Improving the students essay style in general Finding special words and expressions that can be used in essay writing.
Comparative Adjectives. The term comparison of adjectives is used when two or more persons or things have the same quality (height, size, color, any characteristic)
Haben – to have ich habe du hast er/sie hat es hat man hat wir haben
Institut für Umweltphysik/Fernerkundung Physik/Elektrotechnik Fachbereich 1 SADDU June 2008 S. Noël, K.Bramstedt,
Institut für Umweltphysik/Fernerkundung Physik/Elektrotechnik Fachbereich 1 Pointing Meeting Nov 2006 S. Noël IFE/IUP Elevation and Azimuth Jumps during.
Adjektive Endungen von Frau Templeton.
Laurie Clarcq The purpose of language, used in communication, is to create a picture in the mind and/or the heart of another.
Case Study Session in 9th GCSM: NEGA-Resources-Approach
Machen Sie sich schlau am Beispiel Schizophrenie.
How many more nouns can you think of?
Institut AIFB, Universität Karlsruhe (TH) Forschungsuniversität gegründet 1825 Towards Automatic Composition of Processes based on Semantic.
T.Ruf, N.Brook, R.Kumar, M.Meissner, S.Miglioranzi, U.Uwer D.Voong Charge Particle Multiplicity Disclaimer: Work has started only recently! I am not an.
| DC-IAP/SVC3 | © Bosch Rexroth Pneumatics GmbH This document, as well as the data, specifications and other information set forth in.
Morphology and Syntax More on sentence structure.
Frequently Confused Words
Christoph Durt: Wittgenstein on the possibility of philosophy: The importance of an intercultural approach
You need to use your mouse to see this presentation © Heidi Behrens.
Titelmasterformat durch Klicken bearbeiten Textmasterformate durch Klicken bearbeiten Zweite Ebene Dritte Ebene Vierte Ebene Fünfte Ebene 1 Development.
INTAKT- Interkulturelle Berufsfelderkundungen als ausbildungsbezogene Lerneinheiten in berufsqualifizierenden Auslandspraktika DE/10/LLP-LdV/TOI/
Verben Wiederholung Deutsch III Notizen.
Faculty of Public Health Department of Health Economics and Management University of Bielefeld WP 3.1 and WP 4.1: Macrocost EUprimecare Plenary Meeting.
The word,,aber in German is most often used as a coordinating conjunction. Ich wollte nach Bremen fahren aber Mein Auto ist kaputt. Ich mag English aber.
Berner Fachhochschule Hochschule für Agrar-, Forst- und Lebensmittelwissenschaften HAFL Recent activities on ammonia emissions: Emission inventory Rindvieh.
4th Symposium on Lidar Atmospheric Applications
Ein Projekt des Technischen Jugendfreizeit- und Bildungsvereins (tjfbv) e.V. kommunizieren.de Blended Learning for people with disabilities.
Predicate Adjectives Stand alone in a sentence. Das Brot schmeckt gut. Attributive Adjectives Precede a Noun Ending is based on gender, case and preceding.
Design Patterns Ein Muster (pattern) ist eine Idee, die sich in einem praktischen Kontext als nützlich erwiesen hat und dies auch in anderen sein wird.
Talking about yourself
Cross-Polarization Modulation in DWDM Systems
Relativpronomen / Relativsätze:
1 von 10 ViS:AT Abteilung IT/3, IT – Systeme für Unterrichtszwecke ViS:AT Österreichische Bildung auf Europaniveau BM:UKK Apple.
© Boardworks Ltd of 8 Time Manner Place © Boardworks Ltd of 8 This icon indicates that the slide contains activities created in Flash. These.
Deutsch Zwei Guten Tag! Heute ist Dienstag. Die Sinnfrage: Wie fühlst du dich?? Die Ziele: You will discuss what you do/dont do for.
Negation is when you dont have or dont do something.
AS Thema Die Schule.
Adjectiv Endungen Lite: Adjective following articles and pre-ceeding nouns. Colors and Clothes.
Berner Fachhochschule Hochschule für Agrar-, Forst- und Lebensmittelwissenschaften HAFL 95% der Ammoniakemissionen aus der Landwirtschaft Rindvieh Pflanzenbau.
Relativpronomen / Relativsätze:
HRM A – G. Grote ETHZ, WS 06/07 HRM A: Work process design Overview.
Sentence Structure Subject and verb are always together. Subject and verb are always together. Subject and verb must agree Subject and verb must agree.
KIT – die Kooperation von Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH) Vorlesung Knowledge Discovery - Institut AIFB Tempus fugit Towards.
1 Stevens Direct Scaling Methods and the Uniqueness Problem: Empirical Evaluation of an Axiom fundamental to Interval Scale Level.
Technische Universität München Spatial aspects of the formation of GMO-free or GMO clubs Maarten J. Punt Technische Universität München.
Lehrstuhl für Waldbau, Technische Universität MünchenBudapest, 10./11. December 2006 WP 1 Status (TUM) Bernhard Felbermeier.
Adjective Endings Nominative & Accusative Cases describing auf deutsch The information contained in this document may not be duplicated or distributed.
Selectivity in the German Mobility Panel Tobias Kuhnimhof Institute for Transport Studies, University of Karlsruhe Paris, May 20th, 2005.
Technische Universität München 1 CADUI' June FUNDP Namur G B I The FUSE-System: an Integrated User Interface Design Environment Frank Lonczewski.
TUM in CrossGrid Role and Contribution Fakultät für Informatik der Technischen Universität München Informatik X: Rechnertechnik und Rechnerorganisation.
Da & wo compounds da- can be combined with a preposition to replace a prepositional phrase. da- compounds are used only with things, never with people.
Adjective Declension in German
1 For internal use only © WEINMANN GERÄTE FÜR MEDIZIN GMBH+CO.KG prismaLINE - prismaAQUA DIE-FGI_ 28 Jan
Kapitel 8 Grammar INDEX 1.Command Forms: The Du-Command Form & Ihr- Command 2.Sentences & Clauses.
Grundlagen Englisch Passive voice HFW Bern.
FURTHER MASS SPECTROMETRY
 Präsentation transkript:

Corpus based Creation and Extension of Domain-Specific Resources Manuela Kunze, Dietmar Rösner University of Magdeburg

Manuela Kunze2 Overview Background: Corpus Characteristics Experiment 1: Context-related Derivation of Concepts Experiment 2: Clustering of Values

Manuela Kunze3 Corpus: Forensic Autopsy Protocols different document parts: findings histological findings background discussion …

Manuela Kunze4 Autopsy Protocols: Findings short linguistic structures typical attribute-value structures expressed by noun phrases: Unterblutung des Gewebes/Bleeding of tissue. Oberlippenbart/Upper lip beard. noun phrases + verb/adjective/noun phrase Mund geschlossen./Mouth closed. Nebennieren ohne Besonderheiten./Adrenal glands without anomalies. Useable for the extension of the resources in combination with GermaNet?

Manuela Kunze5 Corpus 400 Protocols parsed with a context free grammar (ca. 40 rules) focus of the analyses complex noun phrases derivation of concepts attribute-value structures clustering of values

Manuela Kunze6 Overview Corpus Characteristics Experiment 1: Context-related Derivation of Concepts Experiment 2: Clustering of Values

Manuela Kunze7 Approach analysis of high-frequency complex noun phrases example: Bruch des/der … (fracture of …) occurrence 749 types: 93 known (31): Rippe/rib (254), Brustbein/sternum (65), Wirbelsäule/spine (58), Schambein/pubic bone (30), Schulterblatt/omoplate (23), … unknown (62): Schädeldach/calvarium (43), Oberschenkelknochen/femur (37), Schädelbasis/base of the skull (34), Schlüsselbein/clavicle (33), Brustwirbelsäule/thoracic spine (28), Halswirbelsäule/cervical spine (26), …

Manuela Kunze8 Idea: Analysis of Complex Noun Phrases fracture of keyword of complement fracture of in corpus: class of deduce: class of == class of in GermaNet:

Manuela Kunze9 Approach top level category : T remove senses which are not assigned with the preferred top level category collect all (GermaNet) senses determine the most frequent top level category known complements types of a keyword collect all semantic classes from the hypernym graph for each sense known (31): Rippe/rib (254), Brustbein/sternum (65), Wirbelsäule/spine (58), Schambein/pubic bone (30), Schulterblatt/omoplate (23), … … Finger => Gliedmaße, Extremität Finger => Computerprogramm, Programm Rippe => Knochen, Gebein … top level category: noun.body 36 senses 27 senses 22 different semantic classes 36 senses … Rippe, => Knochen, Gebein, => Hornsubstanz, => Körpersubstanz, => Stoff1, Substanz, Materie, => Objekt, => Hornsubstanz, => Körpersubstanz, => Stoff1, Substanz, Materie, => Objekt, … 31 complement types … Finger => Gliedmaße, Extremität Finger => Computerprogramm, Programm Rippe => Knochen, Gebein …

Manuela Kunze10 Approach collect all semantic classes from the hypernym graph for each semantic class sc: determine the level in the hypernym tree (f sc ) count occurences (n sc ) most specific semantic class: Knochen 22 different semantic classes select the maximum of (f sc * n sc )/N N: number of all semantic classes … Rippe, => Knochen, Gebein, => Hornsubstanz, => Körpersubstanz, => Stoff1, Substanz, Materie, => Objekt, => Hornsubstanz, => Körpersubstanz, => Stoff1, Substanz, Materie, => Objekt, …

Manuela Kunze11 Results 85 % correct assignments (types) 94 % correct assignments (tokens) erroneous cases: correct assignments to wrong complements wrong assignments to correct complements

Manuela Kunze12 Results: Erroneous Cases correct assignments to wrong complements: misspelling of tokens: Oberschenkelknorren erroneous fragments of the treatment of Germans truncations: Bruch des Ober- und Unterarmes erroneous syntactic analysis of the second NP: Bruch der Wandung der … wrong assignments to correct complements: (complex) systems of bones, cartilages, connective tissues: elbow joint

Manuela Kunze13 Overview Corpus Characteristics Experiment 1: Context-related Derivation of Concepts Experiment 2: Clustering of Values

Manuela Kunze14 Clustering of Values conceptual analysis of linguistic structures Mund geschlossen/Mouth closed. Rachenschleimhaut duesterrot. /Mucosa of fauces dark red. Beckengeruest festgefuegt und unversehrt. /Pelvis closely joined and entire. Herzohren frei, ovales Vorhoffenster geschlossen./Auricles of heart clear, oval atrium closed. Brustbein, Rippen und Wirbelsaeule intakt./Sternum, ribs and spine intact. Brustkorb sehr schmal und leicht eindrueckbar./Thorax very narrow and easy to incise. Nebennieren ohne Besonderheiten./Adrenal glands without anomalies. … 1908 concepts Mund/mouth Rachenschleimhaut/mucosa of fauces Beckengeruest/pelvis Herzohren, Vorhoffenster/auricles of heart, atrium Brustbein, Rippen, Wirbelsaeule/sternum, ribs, spine Brustkorb/thorax Nebennieren/adrenal glands 2098 different (linguistic) values geschlossen/closed duesterrot /dark red festgefuegt, unversehrt /closely joined, entire frei, geschlossen/clear, closed intakt/intact sehr schmal, leicht eindrueckbar/very narrow, easy to incise ohne Besonderheiten/ without anomalies Have similar concepts same attributes? What are the values for an attribute?

Manuela Kunze15 Relations Between Values Do the values describe different attributes? color, shape etc. if not, are the values paraphrases/synonyms? antonyms? values of an open range? Which lexical or conceptual relations exist between the values, e.g. synonyms, antonyms etc.? clustering of values

Manuela Kunze16 Examples Mund/mouth: deutlich geoeffnet fischmaulartig geoeffnet schlotartig geoeffnet ruesselartig geoeffnet froschmaulartig geoeffnet ovalaer geoeffnet geoeffnet spaltfoermig geoeffnet geschlossen different kinds of 'opened' vs. closed

Manuela Kunze17 Examples Milzgewebe/spleen tissue: nicht sehr blutreich fest deutlich gelockert stark gelockert relativ gelockert verhaertet gelockert leicht gelockert blutreich sehr blutarm faeulnisbedingt gelockert etwas faeulnisbedingt aufgelockert sehr blutreich concentration of blood consistency, form of tissue

Manuela Kunze18 Examples Wirbelsaeule/spine: ebenfalls unversehrt ebenfalls intakt intakt unversehrt ohne Besonderheiten ohne Verletzungen same findings

Manuela Kunze19 Approach comparison of values of a concept comparisons comparison in several steps 1.character-based: via bigrams 2.lexical-conceptual relations: available information in Germanet

Manuela Kunze20 Approach values of a concept removing negations removing modificators 'corrected' values lexical/conceptual relations in GermaNet? compound? bigrams of values particles: sehr, sonst, ebenfalls adjectives with suffixes: -artig, -lich, -ig example: 'sonst unaufällig' 'unauffällig' negations: 'kein', 'nicht', …

Manuela Kunze21 Results: Character-based Analysis similar values with modifications (particles) and negations selbst unauffaellig sonst unauffaellig unauffaellig glaenzend nicht glaenzend geoeffnet leicht geoeffnet rundlich geoeffnet spaltfoermig geoeffnet spaltweit geoeffnet froschmaulartig geoeffnet… geoeffnet sehr muskelkraeftig nicht sehr muskelstark muskelkraeftig nicht sehr muskelkraeftig nicht muskelkraeftig blutreich nicht-sehr-blutreich sehr-blutreich blutarm relativ-blutarm muskelschwach sehr-muskelschwach geschlossen spaltfoermig-geschlossen

Manuela Kunze22 Integration of GermaNet search for relations between two tokens parts of tokens queries about: coordinate terms synonyms, hypernyms, hyponyms antonyms

Manuela Kunze23 Results with GermaNet sehr muskelkraeftig/very strong muscle vs. sehr muskelschwach/very weak muscle bigrams: , antonym: kraeftig vs. schwach blutarm/bloodless vs. blutreich/bloodrich bigrams: GermaNet: antonym: arm vs. reich feucht/wet vs. sehr trocken/very dry bigrams: GermaNet: coordinate terms, antonym sehr gross/very great vs. sehr weit/very broad bigrams: GermaNet: hypernym frei/free vs. größtenteils vorhanden/mostly existent bigrams: GermaNet: coordinate terms keine Schwellung/no swelling vs. keine Verletzung/no trauma bigrams: 0.42, 0.4 GermaNet: hypernym

Manuela Kunze24 Results: Character-based + GermaNet selbst unauffaellig sonst unauffaellig unauffaellig glaenzend nicht glaenzend blutreich nicht-sehr-blutreich sehr-blutreich blutarm relativ-blutarm sehr muskelkraeftig nicht sehr muskelstark muskelkraeftig nicht sehr muskelkraeftig nicht muskelkraeftig muskelschwach sehr-muskelschwach geoeffnet leicht geoeffnet rundlich geoeffnet spaltfoermig geoeffnet spaltweit geoeffnet froschmaulartig geoeffnet… geoeffnet geschlossen spaltfoermig-geschlossen

Manuela Kunze25 Problem: Paraphrases Wirbelsaeule/spine: intakt unversehrt ohne Besonderheiten ohne Verletzungen same findings future work

Manuela Kunze26 Idea: Detection of Paraphases/Synonyms document information + corpus information to analyse the value sets of a document compare the value sets of a concept described in different documents values, which are synonyms or antonyms dont occur in a document Example: Spine closely joined and entire. closely joined, entire: different attributes

Manuela Kunze27 Idea: Detection of Paraphases/Synonyms collect all values for a concept: candidates entire closely jointed entire closely jointed candidates: intact == broken == entire/closely jointed == entire ? AP#1Ap#nAP#2AP#3 … … broken intact AP#4AP#5 entire values for the concept 'spine':

Manuela Kunze28 Idea: Detection of Paraphases/Synonyms removing of candidates: only one paraphrase bleedings or without bleedings antonyms closely joined vs. entire occur in the same document (for a concept) prefer: entire (number of occurrences) assumption: closely joined is an 'additional' attribute selection of candidates (restrictions): only frequent values similar number of occurrences? verification of results: to obtain value sets of other concepts which have similar values

Manuela Kunze29 Problems: Detection of Paraphrases a value can be expressed by more than one value 'value 1' == 'value 2' + 'value 3' result (set of paraphrases for a value) can contain antonyms

Manuela Kunze30 Detection of Paraphases/Synonyms solutions? integration of other resources: UMLS extension of GermaNet 1 sense of unversehrt Sense 1 unverletzt, unversehrt => heil => gesund => ?krankheitsspezifisch => ?körperzustandsspezifisch => ?körperspezifisch 1 sense of intakt Sense 1 intakt, ganz1, funktionstüchtig, funktionsfähig => ?funktionalitätsspezifisch => ?relationsspezifisch same meaning?

Manuela Kunze31 Conclusion experiments about corpus based semiautomatic extension of GermaNet analysis of complex noun phrases detection and transfer of GermaNet classes clustering of values bigrams using GermaNet information

Manuela Kunze32 Improvement wrong splitting based on wrong parsing results GermaNet-Interface treatment of umlauts inflectional suffixes selection of the relevant tokens selection of the correct sense

Manuela Kunze33 Example Brustwirbelsaeule festgefuegt und unversehrt: {festgefuegt, unversehrt/entire} Brustwirbelsaeule intakt: {intakt/intact} Brustwirbelsaeule festgefuegt und unversehrt: {festgefuegt, unversehrt /entire} Brustwirbelsaeule intakt:{intakt/intact} Die Brustwirbelsaeule ist zweifach gebrochen: {gebrochen /broken} Brustwirbelsaeule unversehrt: {unversehrt/entire} ….

Manuela Kunze34 Idea : Detection of Paraphases/Synonyms restriction: candidates are frequent phrases in a value set assumption: paraphrases don't occur in one set (document) paraphrases have a similar number of occurrences verification: to obtain other value sets

Manuela Kunze35 Idea: Analysis of Complex Noun Phrases high-frequency complex noun phrases NP NP genitive NP NP genitive PP+ NP PP+