Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures.

Slides:



Advertisements
Ähnliche Präsentationen
INSURANCE AUDIT FINANCIAL SERVICES Risk margins: An area of conflict between accounting and supervision Joachim Kölschbach Vienna, October 2005.
Advertisements

Question words and word order
Steinbeis Forschungsinstitut für solare und zukunftsfähige thermische Energiesysteme Nobelstr. 15 D Stuttgart WP 4 Developing SEC.
Multi electron atoms Atoms with Z>1 contain >1 electron. This changes the atomic structure considerably because in addition to the electron-nucleus interaction,
Fakultät für informatik informatik 12 technische universität dortmund Optimizations Peter Marwedel TU Dortmund Informatik 12 Germany 2009/01/17 Graphics:
Peter Marwedel TU Dortmund, Informatik 12
Subjects and Direct Objects When to use der vs. den.
Institut für Verkehrsführung und Fahrzeugsteuerung > Technologien aus Luft- und Raumfahrt für Straße und Schiene Automatic Maneuver Recognition in the.
Lancing: What is the future? Lutz Heinemann Profil Institute for Clinical Research, San Diego, US Profil Institut für Stoffwechselforschung, Neuss Science.
Three minutes presentation I ArbeitsschritteW Seminar I-Prax: Inhaltserschließung visueller Medien, Spree WS 2010/2011 Giving directions.
Introduction to the topic. Goals: Improving the students essay style in general Finding special words and expressions that can be used in essay writing.
Comparatives Meine Stadt ist schön Deine Stadt ist schöner Just add er to the end of the adjective to make it more… e.g. more beautiful.
Haben – to have ich habe du hast er/sie hat es hat man hat wir haben
Institut für Umweltphysik/Fernerkundung Physik/Elektrotechnik Fachbereich 1 SADDU June 2008 S. Noël, K.Bramstedt,
Adjektive Endungen von Frau Templeton.
Zu + Infinitiv : eine Erklärung
Question words and word order By the end of this lesson you will have revised question words By the end of this lesson you will be able to use question.
Machen Sie sich schlau am Beispiel Schizophrenie.
Institut AIFB, Universität Karlsruhe (TH) Forschungsuniversität gegründet 1825 Towards Automatic Composition of Processes based on Semantic.
T.Ruf, N.Brook, R.Kumar, M.Meissner, S.Miglioranzi, U.Uwer D.Voong Charge Particle Multiplicity Disclaimer: Work has started only recently! I am not an.
Vocab Test #2 to thank danken penpal der Brieffreund discrimination
| DC-IAP/SVC3 | © Bosch Rexroth Pneumatics GmbH This document, as well as the data, specifications and other information set forth in.
Das Perfekt (Present Perfect Tense). Think of 5 things you did in your holidays but think of sentences in the PRESENT TENSE. 1.Am Montag schlafe ich viel.
Morphology and Syntax More on sentence structure.
BAS5SE | Fachhochschule Hagenberg | Daniel Khan | S SPR5 MVC Plugin Development SPR6P.
Reflexive Verbs.
Ich möchte ein Eisberg sein. Last time … 3 icebergs Triangels Unique connections Ich möchte ein Eisberg sein
Christoph Durt: Wittgenstein on the possibility of philosophy: The importance of an intercultural approach
Things I really, really need to remember.. AKKUSATIVDATIV DurchAus FürAußer GegenBei OhneMit UmNach BisSeit Von Zu These prepositions always trigger these.
Prepositions nach mit in seit bei hinter von aus zu auf für vor.
bei in seit mit auf hinter von nach aus zu für vor.
You need to use your mouse to see this presentation
You need to use your mouse to see this presentation © Heidi Behrens.
You need to use your mouse to see this presentation © Heidi Behrens.
INTAKT- Interkulturelle Berufsfelderkundungen als ausbildungsbezogene Lerneinheiten in berufsqualifizierenden Auslandspraktika DE/10/LLP-LdV/TOI/
Guten Tag! Dienstag den Hausaufgabe für Mittwoch Note Syllabus Change! G 4.5 Dative Prepositions Quiz all separable verbs and ALL months.
Verben Wiederholung Deutsch III Notizen.
Blitzkrieg Prompt: Get with your partner number 3: You and a friend from Germany are trying to figure out what to do this coming week. Hold a 1 minute.
Modal Verbs Modal verbs are not action verbs!
Relative Clauses.
Tage der Woche German Early Level Montag Dienstag Mittwoch Donnerstag
Kölner Karneval By Logan Mack
The word,,aber in German is most often used as a coordinating conjunction. Ich wollte nach Bremen fahren aber Mein Auto ist kaputt. Ich mag English aber.
Einführung Bild und Erkenntnis Einige Probleme Fazit Eberhard Karls Universität Tübingen Philosophische Fakultät Institut für Medienwissenschaft Epistemic.
Berner Fachhochschule Hochschule für Agrar-, Forst- und Lebensmittelwissenschaften HAFL Recent activities on ammonia emissions: Emission inventory Rindvieh.
Ein Projekt des Technischen Jugendfreizeit- und Bildungsvereins (tjfbv) e.V. kommunizieren.de Blended Learning for people with disabilities.
The most obvious or direct use of auch is to mean also. Ich möchte auch Gitarre lernen. Auch ich möchte Gitarre lernen. I would like to learn Guitar. Someone.
Grammatik Deutsch I Kapitel 3 – 1. Stufe LERNZIEL:
The cheating verbs… (modal verbs). Modal Verb Chart wollenmüssenkönnenmögensollendürfenmöchten Ichwillmusskannmagsolldarfmöchte Duwillstmusstkannstmagstsollstdarfstmöchtest.
Hätte gern vs. Möchte gern
Cross-Polarization Modulation in DWDM Systems
Welcome Instructor: Dominik Dwight Zethmeier
© Boardworks Ltd of 8 Time Manner Place © Boardworks Ltd of 8 This icon indicates that the slide contains activities created in Flash. These.
By: Jade Bowerman. German numbers are quite a bit like our own. You start with one through ten and then you add 20, 30, 40 or 50 to them. For time you.
Heute ist Montag, der 17. September 2012: Lernziel: 1.Diskussion: Sprechen 2.Wie kommst du zur Schule? 3.Leseverständnis (Reading Comprehension) 4.Quiz.
Can you tell me about your school?
Adjectiv Endungen Lite: Adjective following articles and pre-ceeding nouns. Colors and Clothes.
Sentence Structure Subject and verb are always together. Subject and verb are always together. Subject and verb must agree Subject and verb must agree.
To school => zu der Schule With friends => mit den Freunden On top of the desk => auf dem Schreibtisch Through the wall => durch die Wand.
German Word Order explained!
Separable Verbs Turn to page R22 in your German One Book R22 is in the back of the book There are examples at the top of the page.
1 Stevens Direct Scaling Methods and the Uniqueness Problem: Empirical Evaluation of an Axiom fundamental to Interval Scale Level.
Technische Universität München Spatial aspects of the formation of GMO-free or GMO clubs Maarten J. Punt Technische Universität München.
Adjective Endings Nominative & Accusative Cases describing auf deutsch The information contained in this document may not be duplicated or distributed.
Selectivity in the German Mobility Panel Tobias Kuhnimhof Institute for Transport Studies, University of Karlsruhe Paris, May 20th, 2005.
Technische Universität München 1 CADUI' June FUNDP Namur G B I The FUSE-System: an Integrated User Interface Design Environment Frank Lonczewski.
Warm-up: Kickers ‘ob’, ‘dass’, ‘weil’
Lust auf Lesen Treffpunkt Deutsch Sixth Edition
Kleidung Projekt For this project you will critique 4 or 5 outfits as worn by “celebrities”. You may not write anything on the slide except for maybe a.
FURTHER MASS SPECTROMETRY
 Präsentation transkript:

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Want to identify the practical feasibility of a certain AM for identifying collocations ?which types of collocation ?which corpora (domain, size) ?high frequency versus low frequency data compare the outcomes of different association measures

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS We have differently ranked collocation candidates We need true collocation data for comparison, e.g collocation lexica list of true collocations occurring in the extraction corpus

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Problems & Inconveniences using collocation lexica for evaluation will not tell us how well an AM worked on a particular corpus it only tells us that some of the reference collocations also occur in in our base data and the AM has found them

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Problems & Inconveniences Using a list of true collocations occurring in the extraction corpus requires a good deal of hand- annotation requires objective criteria for the distinction of collocational and noncollocational word combinations in our candidate list

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Our Approach Evaluation of lexical association measures AMs against a manually identified reference corpus of true collocations (TPs) Evaluation based on the full reference set Precise, linguistically motivated definition of TPs The evaluation of results based on recall and precision graphs

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS For Further Discussion Testing for significance of AMs is an important but still open question There is a potential for fine- tuning of AMs given a specific data set and a particular type of collocations to be extracted (Krenn, Evert 2001)

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation Experiments

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Data Extraction corpora newspaper: 8 million words Frankfurter Rundschau Corpus (ECI Multilingual Corpus 1) newsgroup: 10 million words FLAG corpus (LT-DFKI)

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Data Base data: list of PP-verb pairs ~ (PN,V)-combinations Collocation types: support verb constructions FVG figurative expressions figur

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Examples

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Support Verb Constructions FVG verb-object collocation function as predicates can be paraphrased by main verbs NP-verb or PP-verb verbal collocate (function verb / light verb / support verb) main verb conveys Aktionsart and causativity

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Support Verb Constructions FVG nominal collocate abstract noun often de-verbal or de-adjectival contributes the core meaning (prepositional collocate) verbal and nominal collocate together determine the argument structure of the collocation

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS FVG Examples pred. phrase verbActionsartcaustranslation in Betriebgehenincho-go into operation nehmenincho+put into operation setzenincho+start up seinneutral-be running bleibencontin-keep on running lassencontin+keep (sth) running

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS FVG Examples pred. phrase verbActionsartcaustranslation ausser Betrieb gehentermin-go out of sevice nehmentermin+take out of sevice setzentermin+stop seinneutral-be out of order bleibencontin-stay out of order lassencontin+keep out of order

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Figurative Expressions figur not restricted to NP/PP-verb figurative reinterpretation of literal meaning required (e.g., unter die Haut gehen (get under ones skin) nouns: conrete verbs: often causative-noncausative alternation e.g., auf Eis legen (put on ice) auf Eis liegen (be on ice)

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Decision Tree: FVG versus figur

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Frequency Distributions

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Frequency Distributions

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Frequency Distributions

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Combination of Properties in the Candidate Lists newspaper f >= 3 FVG newspaper f >= 3 figur newsgroup f >= 5 FVG,figur newspaper f >= 3 FVG,figur

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation Procedure Source Corpus ab Dienstag bietet ab Donnerstag bietet ab Freitag bietet ab Jahren beginnt ab Jahren bietet ab Jahren eingeladen ab Jahren geeignet ab Jahren heißt ab Jahren käthi ab Jahren tanzen ab Jahren treffen ab Juni restauriert ab Mark finden ab Mark kostet ab Mark zu_finden ab März bietet+an ab Mittwoch bietet ab Notierungen nutzen ab Notierungen zu_nutzen ab November einladen... t-scoreCandidate pair candidate list

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation Procedure Rank um Uhr beginnt bis Uhr geöffnet zur Verfügung stehen zur Verfügung gestellt zur Verfügung stellen ums Leben gekommen zur Verfügung steht auf Programm stehen in Anspruch genommen auf Tagesordnung stehen am Dienstag sagte am Montag sagte auf Seite lesen auf Kürzungen behält vor auf Programm steht im Mittelpunkt steht in Regionalausgabe erscheint an Stelle melden auf Seite zeigen zur Verfügung zu_stellen... t-scoreCandidate pair significance list

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation Procedure: N-best Lists Rank um Uhr beginnt bis Uhr geöffnet zur Verfügung stehen zur Verfügung gestellt zur Verfügung stellen ums Leben gekommen zur Verfügung steht auf Programm stehen in Anspruch genommen auf Tagesordnung stehen am Dienstag sagte am Montag sagte auf Seite lesen auf Kürzungen behält vor auf Programm steht im Mittelpunkt steht in Regionalausgabe erscheint an Stelle melden auf Seite zeigen zur Verfügung zu_stellen... t-scoreCandidate pair 9 false positives 11 true positives precision: 11/20 = 55% total: 1280 TPs

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Precision Graph: PNV full forms

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Base Line: Random Selection

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Precision Graphs

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Precision Graphs

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Precision Graphs

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Recall Graphs

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Precision/Recall

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Precision Graphs: Newspaper, FVG + figur

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Precision Graphs: Newspaper FVG figur

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Precision Graphs: AdjN

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Precision Graphs: AdjN

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Precision/Recall: AdjN

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Frequency Layers: AdjN Data f 5 2 f < 5

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Frequency Layers: PNV Data f 10 3 f < 5

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Lemmas vs. Word Forms (PNV) lemmas f 3 word forms f 3

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Text Type and Domain (PNV) news group discussions newspaper comparison for non-lemmatised candidates

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The MI Mystery (FVG) region of high "local precision" for 4.0 < MI < 7.5

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Further particularities of the newspaper data candidates with MI > 7.5 are more frequent than expected under independence assumption but very few FVG among them data do not support the counter- MI argument of overestimation of data with low-frequency joint and marginal distributions

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS optimized MI | MI | account for the FVG concentration among 4.0 = 7.5 in the newspaper test data

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Summary of Results Best measures: t-score / frequency best for identifying PP-verb collocations (FVG, figur) log-likelihood, t-score, Fisher, binominal and multinominal p value work well for AdjN

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Summary of Results Reproducibility of results for different text types: Precision results from newsgroup data comparable to newspaper data Strong evidence that identical classes of collocations are similarly distributed in different types of corpora

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Summary of Results Differences in suitability of AMs to identify particular collocation types: (PN,V)-candidates with high MI score are less likely to be FVG Log-likelihood not well suited for identifying FVG but better suited for identifying figur

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Summary of Results Experimental results based either on a small number of best- scoring candidates or on more than the first 50 % of the SLs are unreliable

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Conclusion on AMs Optimal results do not necessarily come from a statistical discussion but from tuning on a particular data set

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Vast Land: Lowest-frequency Data lowest-frequency data (hapax legomena, dis legomena,...) are a serious challenge for all statistical approaches typical solution: cut-off thresholds Evert/Krenn used cut-off thresholds in evaluation to reduce manual annotation work need to estimate number of TPs among excluded lowest-frequency candidates