March 10, th Annual Conference of Gfkl, 2005

Slides:



Advertisements
Ähnliche Präsentationen
Relative Clauses.
Advertisements

Die deutsche Satzstellung
A definite relative pronoun must agree in gender and number with the noun or pronoun to which it refers which is often called the antecedent. The case.
Ich bau eine Stadt für dich “I am building a city for you”
Plural Forms of Nouns & Wie viel? or Wie viele?
You need to use your mouse to see this presentation © Heidi Behrens.
You need to use your mouse to see this presentation © Heidi Behrens.
You need to use your mouse to see this presentation © Heidi Behrens.
You need to use your mouse to see this presentation © Heidi Behrens.
Universität StuttgartInstitut für Wasserbau, Lehrstuhl für Hydrologie und Geohydrologie Copulas (1) András Bárdossy IWS Universität Stuttgart.
German Article & Adjective Combos Do you find it difficult to choose the correct adjective endings? It can help to realize that not all adjective endings.
Wo wohnst du? Objektiv To talk about where you live and your opinions of it MUST be able to recall vocabulary to do with where you live SHOULD be able.
Coordinating Conjunctions Why we need them & how to use them deutschdrang.com.
 Every part in a sentence has a grammatical function. Some common functions are: - Subject - Verb - Direct object / accusative object - Indirect object.
Personal Pronouns.
Mein Arbeitspraktikum. Today we are learning to talk about work experience we have done, giving facts, details and opinions The bigger picture: We are.
Die Fragen Wörter Wer? Was? Wann?.
Nominative & Accusative Basic Rules for Relative Pronouns in German:
Akkusativ Präpositionen
Deutsch 3 Frau Snell.
What is a “CASE”? in English: pronouns, certain interrogatives
Possessive Adjectives How to show belonging… The information contained in this document may not be duplicated or distributed without the permission of.
What is a “CASE”? in English: pronouns, certain interrogatives
Ordering Food A Guide. Im Restaurant An actual restaurant is the chance to use more formal ordering. “Ich hätte gern eine Pizza.” “Ich möchte eine Cola.”
Museumsinsel Museum Island (German: Museumsinsel) is the name of the northern half of an island in the Spree river in the central Mitte district of Berlin,
Need: paper, coloured pens, glue, dwarf templates, dictionaries, adjective handout, judges hand out, blue tack For gallery – give students blue tack and.
type / function / form type of words:
Schreiben Sie fünf Sätze aus diesen Elementen. [Beispiel
COMMANDS imperative 1. you (formal): Sie 2. you (familiar plural): ihr
Unterwegs.
Kapitel 4 Grammar INDEX 1.Ordinal Numbers 2.Relative Pronouns and Relative Clauses 3.Conditional Sentences 4.Posessive: Genitive Case.
You need to use your mouse to see this presentation © Heidi Behrens.
Kapitel 2 Grammar INDEX 1.Subjects & Verbs 2.Conjugation of Verbs 3.Subject Verb Agreement 4.Person and Number 5.Present Tense 6.Word Order: Position of.
Kapitel 7 Grammar INDEX 1.Comparison 2.Adjectives 3.Adjective Endings Following Ein-Words.
Memorisation techniques
Kapitel 8 Grammar INDEX 1.Command Forms: The Du-Command Form & Ihr- Command 2.Sentences & Clauses.
Here‘s what we‘ll do... Talk to the person sitting in front of you. Introduce each other, and ask each other questions concerning the information on your.
Reflexiv-Verben Deutsch 2/AC.
Kapitel 5: Einkaufen Sprache. Alles klar Look over and know all of the Wortschatz on Seite 171 and 172. Look over the illustration found on Seite 145.
Word order: 1.In a main clause the VERB is the second idea: Helgakommteben aus der Bäckerei This may not be the second word Meiner Meinung nachsind Hobbys.
Money rules the medicine?! A presentation by Jan Peter Hoffmann European healthcare systems in comparison.
How to play: Students are broken up into 2-3 teams (depending on class size). Students can see the game board and the categories, but not point values.
Essay structure Example: Die fetten Jahre sind vorbei: Was passiert auf der Almhütte? Welche Bedeutung hat sie für jede der vier Personen? Intro: One or.
Strukturen 3B.2 LEKTION 3B 3B.2-1© 2014 by Vista Higher Learning, Inc. All rights reserved. Time expressions Startblock German has two main concepts related.
Sentence Structure Questions
Quiz: word order Word order rules with conjunctions
Volume 1, Chapter 9.
What is a “CASE”? Grammatical cases indicate how certain words function in a sentence. The case of a word is shown by the particular form of the word itself.
Volume 1, Chapter 8.
Dom zu Lübeck The Lübeck Cathedral (German: Dom zu Lübeck, or colloquially Lübecker Dom) is a large brick Lutheran cathedral in Lübeck, Germany and part.
Freizeit Thema 5 Kapitel 1 (1)
you: ihr ( familiar plural ) you: du ( familiar singular)
Sentence Structure Connectives
Englisch Grundlagen, Modal Verbs
Synonyms are two or more words belonging to the same part of speech and possessing one or more identical or nearly identical denotational meanings, interchangeable.
VERB CONJUGATION AND WORD ORDER
Relative Clauses Frau Lizz Caplan-Carbin.
Students have revised SEIN and HABEN for homework
“wish” “as if” “if only it were so”
THE PERFECT TENSE IN GERMAN
To English Translations
Übung: Write a German word that meets the following criteria
type / function / form type of words:
Official Statistics Web Cartography in Germany − Regional Statistics, Federal and European Elections, Future Activities − Joint Working Party meeting.
DATIV Ich schreibe meinem Bruder einen Brief.
The Conversational Past
Explanations and Classwork Practice
School supplies.
You need to use your mouse to see this presentation
Zhunussova G., AA 81. Linguistic communication, i.e. the use of language, is characteristically vocal and verbal behaviour, involving the use of discrete.
 Präsentation transkript:

March 10, 2005 29th Annual Conference of Gfkl, 2005 Automatic Extension of Feature-based Semantic Lexicons via Contextual Features Chris Biemann University of Leipzig Germany Rainer Osswald FernUniversität Hagen Germany March 10, 2005 29th Annual Conference of Gfkl, 2005

Outline Motivation: Lexicon extension for Semantic Parsing From co-ocurrences to adjective profiles of nouns Inheritance mechanism for semantic features Results for complex classes Results for binary classes and their combination Discussion

Motivation Semantic parsing aims at finding a semantic representation for a sentence Semantic parsing needs as a prerequisite semantic features of words. Semantic features are obtained by manually creating lexicon entries (expensive in terms of time and money) Given a certain amount of manually created lexicon entries, it might be possible to train a classifier in order to find more entries

HaGenLex: Semantic Lexicon for German size: 22‘700 entries of these: 11‘300 nouns, 6‘700 verbs WORD SEMANTIC CLASS Aggressivität nonment-dyn-abs-situation Agonie nonment-stat-abs-situation Agrarprodukt nat-discrete Ägypter human-object Ahn human-object Ahndung nonment-dyn-abs-situation Ähnlichkeit relation Airbag nonax-mov-art-discrete Airbus mov-nonanimate-con-potag Airport art-con-geogr Ajatollah human-object Akademiker human-object Akademisierung nonment-dyn-abs-situation Akkordeon nonax-mov-art-discrete Akkreditierung nonment-dyn-abs-situation Akku ax-mov-art-discrete Akquisition nonment-dyn-abs-situation Akrobat human-object ... ... semantic class

Characteristics of semantic classes in HaGenLex In total 50 semantic classes for nouns are constructed from allowed combinations of: 16 semantic features (binary), e.g. HUMAN+, ARTIFICIAL- 17 ontologic sorts, e.g. concrete, abstract-situation... sort (hierarchy) semantic features semantic classes

Application: WOCADI-Parser „Welche Bücher von Peter Jackson über Expertensysteme wurden bei Addison-Wesley seit 1985 veröffentlicht?“

Underlying Assumptions Harris 1968: Distributional Hypothesis semantic similarity is a function over global contexts of words. The more similar the contexts, the more similar the words Projected on nouns and adjectives: nouns of similar semantic classes are modified through similar adjectives The neighbouring co-occurrence relation between adjectives as left neighbours and nouns as right neighbours approximates typical head-modifier structures

Neighbouring Co-occurrences and Profiles Significant co-occurrences reflect relations between words. To determine, which are significant, a significance measure is used (here log-likelihood) In the following, we look at adjectives which appear significantly (speak: typically) left to nouns and nouns appearing significantly right of adjectives The set of adjectives that co-occur significantly often to the left of a noun is called ist adjective profile (analogous definition of noun profile for adjectives) For experiments, we use the most recent German corpus of „Projekt Deutscher Wortschatz“, 500 million tokens

Example: neighbouring profiles word adjektiv / noun profile Buch neu, erschienen, erst, neuest, jüngst, gut, geschrieben, letzt, zweit, vorliegend, gleichnamig herausgegeben, nächst, dick, veröffentlicht, ... Käse gerieben, überbacken, kleinkariert, fett, französisch, fettarm, löchrig, holländisch, handgemacht, grün, würzig, selbstgemacht, produziert, schimmelig, Camembert gebacken, fettarm, reif überbacken Schweinesteak, Aubergine, Blumenkohl, Käse erlegt Tier, Wild, Reh, Stück, Beute, Großwild, Wildkatzen, Büffel, Rehbock, Beutetier, Wal, Hirsch, Hase, Grizzly, Wildschwein, Thier, Eber, Bär, Mücke, ganz Leben, Bündel, Stück, Volk, Wesen, Vermögen, Herz, Heer, Arsenal, Dorf, Land, Können, Berufsleben, Paket, Kapitel, Stadtviertel, Rudel, Jahrzehnt, ... Word transl. adjektive / noun profile translations book new, published, first, newest, most recent, recently, good, written, last, second, onhand, eponymous, next, thick, ... cheese grated, baked over, small minded, fat, French, low-fat, holey, Dutch, hand-made, green, spicey, self-made, produced, moldy camembert baken, low-fat, ripe baked over steak, aubergine, cauliflower, cheese brought down animal, game, deer, piece, prey, big game, wild cat, buffalo, roebuck, prey animal, whale, hart, bunny, grizzly, wild pig, boar, bear, ... whole life, bundle, piece, population, kind, fortune, heart, army, anrsenal, village, country, ability, career, packet, chapter, quater, pack, decade ... amount: 125‘000 nouns, 25‘000 adjectives

Mechanism of Inheritance Which class is assigned to N4 in the next step? Algorithm: Initialize adjective and noun profiles; Initialize the start set; As long as new nouns get classified { calculate class probabilities for each adjective; for all yet unclassified nouns n { Multiply class probabilities per class of modifying adjectives; Assign the class with highest probabilities to n; } Class probabilities per adjective: count number of classes normalize on total number of class wrt. noun classes normalize to 1

Example: Topf (pot) adjektive profile of Topf (pot) = ax-mov-art-discrete: angebrannt(X) heiß(-) ehern(-) fremd(-) divers(-) zerbeult(X) brodelnd(-) staatlich(-) gußeisern(-) tönern(X) gemeinsam(-) groß(-) irden(X) verschieden(-) verschlossen(-) anonym(-) rund(-) flach(-) Bremer(-) geschlossen(-) passend(-) gesondert(-) andere(-) riesig(-) Golden(-) eisern(-) europäisch(-) viel(-) öffentlich(-) mehr(-) golden(-) leer(-) klein(-) getrennt(-) möglich(-) speziell(-) übervoll(X) dampfend(-) gleich(-) gefüllt(-) # classes per adjective: angebrannt (burnt): {nat-substance=1, art-substance=1, ax-mov-art-discrete=1} Suppe (soup) art_substance Zigarette (cigarette) ax-mov-art-discrete Milch (milk) nat-substance zerbeult (dented): {nonmov-art-discrete=1, mov-nonanimate-con-potag=2, nonax-mov-art-discrete=1, ax-mov-art-discrete=3} Wagen, Auto (wagon, car) mov-nonanimate-con-potag Fahrzeug, Mountainbike, Posaune (vehicle, mountainbike, trombone) ax-mov-art-discrete Mantel (coat) nonax-mov-art-discrete Dach (roof) nonmov-art-discrete irden (earthen): {art-con-geogr=1, nonax-mov-art-discrete=1, ax-mov-art-discrete=9} Schal (shawl) nonax-mov-art-discrete Hafen (port) art-con-geogr Teller, Flasche, Schüssel, Becher, Geschirr, Vase, Krug, Gefäß, Napf (plate, bottle, bowl, cup, dishes, vase, mug, jar) ax-mov-art-discrete tönern (clay-made): {ax-mov-art-discrete=1, prot-discrete=1} Fuß (foot) prot-discrete Gefäß (mug) ax-mov-art-discrete übervoll (over-filled): {nonmov-art-discrete=3, art-con-geogr=1, nonment-dyn-abbs-situation=1, nonax-mov-art-discrete=1} Zimmer, Saal, Lager (room, hall, encempment) nonmov-art-discrete Stall (stable) art-con-geogr Vorlesung (lecture) nonment-dyn-abs-situation Tablett (tray) nonax-mov-art-discrete Class probabilities: {mov-nonanimate-con-potag=2.8E-25, ax-mov-art-discrete=5.8E-8, art-con-geogr=1.5E-20, nonax-mov-art-discrete=2.1E-15, nat-substance=3.3E-25, nonment-dyn-abs-situation=1.6E-25, prot-discrete=5.0E-25, art-substance=3.3E-25, nonmov-art-discrete=7.1E-20}

Parameters Minimal number of adjectives: minAdj A noun needs at least minAdj classifying adjectives avoids statistical noise and implies frequency threshold. Maximal number of classes per adjective: maxClass An adjective is only used for classification if it favours maximally maxClass different classes unspecific adjectives do not distort the results

Experimental Data 4726 nouns comply to minAdj=5, that means maximal recall=78,2% In all experiments, 10-fold-cross validation was used

Results global classification Classification was carried out directly on 50 semantic classes Different measuring points correspond to parameters minAdj in {5,10,15,20}, maxClass in {2, 5, 50} Results too poor for lexicon extension

Combining single classifiers Architecture: binary classifiers for single features, then combinding the outcome. Parameter: minAdj=5, maxClass=2 ANIMAL +/- ANIMATE +/- Selection: compatible semantic classes that are minimal w.r.t hierarchy and unambiguous. ARTIF +/- AXIAL +/- result class or reject ... (16 features) ab +/- abs +/- ad +/- as +/- ... (17 sorts)

Results: single semantic features Name Anzahl + - Bias method 6004 12 5992 0,0020 instit 6032 39 5993 0,0065 mental 9008 162 8846 0,0180 info 6015 119 5896 0,0198 animal 5995 143 5852 0,0239 geogr 188 5827 0,0313 thconc 6028 518 5510 0,0859 instru 5932 969 4963 0,1634 human 1313 4682 0,2190 legper 6009 1352 4657 0,2250 animate 6010 1505 4505 0,2504 potag 1664 4351 0,2766 artif 5864 2204 3660 0,3759 axial 5892 2260 3632 0,3836 movable 2345 3482 0,4024 spatial 6033 2910 3123 0,4823 for bias >0,05 good to excellent precision total precision: 93,8% (86,8% for feature +) total recall: 70,7% (69,2% for feature +)

Results: ontologic sorts Name Anzahl + - Bias re 6033 7 6026 0,0012 mo 8 6025 0,0013 o- 5994 39 0,0065 oa 6045 41 6004 0,0068 me qn ta 107 5926 0,0177 s 6010 224 5786 0,0373 as 6031 363 5668 0,0602 na 411 5622 0,0681 at 450 5583 0,0746 io 664 5369 0,1101 ad 1481 4550 0,2456 abs 1846 4187 0,3060 d 2663 3347 0,4431 co 2910 3123 0,4823 ab- 3082 2951 0,4891 for bias >0,10 good to excellent precision total precision: 94,1% (89,5% for sort +) total recall: 73,6% (69,6% for sort +)

Results: comb. semantic classes Klasse Anz. Prec Rec nonment-dyn-abs-situation 1421 89,19 34,27 human-object 1313 96,82 69,54 prot-theor-concept 516 53,71 18,22 nonoper-attribute 411 0,00 ax-mov-art-discrete 362 55,64 40,88 nonment-stat-abs-situation 226 36,84 6,19 animal-object 143 100,0 26,57 nonmov-art-discrete 133 57,41 23,31 ment-stat-abs-situation 126 51,28 15,87 nonax-mov-art-discrete 108 31,48 15,74 tem-abstractum 107 96,77 28,04 mov-nonanimate-con-potag 98 70,45 31,63 art-con-geogr 96 58,70 28,12 abs-info 94 42,31 11,70 art-substance 88 60,47 29,55 nat-discrete 31,82 nat-substance 86 57,14 9,30 prot-discrete 73 57,53 nat-con-geogr 63 65,00 20,63 prot-substance 50 40,00 mov-art-discrete 45 37,78 meas-unit 41 90,91 24,39 oper-attribute 39 Institution ment-dyn-abs-situation 36 plant-object 34 8,82 mov-nat-discrete 27 22,22 con-info 25 8,00 Rest 157 39,24 19,75 no connection between amount of class and results visible total precision: 80,2% total recall: 34,2%, number of newly classified nouns: 6649

Typical mistakes Pflanze (plant) animal-object instead of plant-object zart, fleischfressend, fressend, verändert, genmanipuliert, transgen, exotisch, selten, giftig, stinkend, wachsend... Nachwuchs (offspring) human-object instead of animal-object wissenschaftlich, qualifiziert, akademisch, eigen, talentiert, weiblich, hoffnungsvoll, geeignet, begabt, journalistisch... Café (café) art-con-geogr instead of nonmov-art-discrete (cf. Restaurant) Wiener, klein, türkisch, kurdisch, romanisch, cyber, philosophisch, besucht, traditionsreich, schnieke, gutbesucht, ... Neger (negro) animal-object instead of human-object weiß, dreckig, gefangen, faul, alt, schwarz, nackt, lieb, gut, brav but: Skinhead (skinhead) human-object (ok) {16,17,18,19,20,21,22,23,30}ährig, gleichaltrig, zusammengeprügelt, rechtsradikal, brutal In most cases the wrong class is semantically close. Evaluation metrics did not account for that.

Any Questions? Thank you very much!