Präsentation herunterladen
Die Präsentation wird geladen. Bitte warten
Veröffentlicht von:Ειδοθεα Μάγκας Geändert vor über 6 Jahren
1
March 10, 2005 29th Annual Conference of Gfkl, 2005
Automatic Extension of Feature-based Semantic Lexicons via Contextual Features Chris Biemann University of Leipzig Germany Rainer Osswald FernUniversität Hagen Germany March 10, 2005 29th Annual Conference of Gfkl, 2005
2
Outline Motivation: Lexicon extension for Semantic Parsing
From co-ocurrences to adjective profiles of nouns Inheritance mechanism for semantic features Results for complex classes Results for binary classes and their combination Discussion
3
Motivation Semantic parsing aims at finding a semantic representation for a sentence Semantic parsing needs as a prerequisite semantic features of words. Semantic features are obtained by manually creating lexicon entries (expensive in terms of time and money) Given a certain amount of manually created lexicon entries, it might be possible to train a classifier in order to find more entries
4
HaGenLex: Semantic Lexicon for German
size: 22‘700 entries of these: 11‘300 nouns, 6‘700 verbs WORD SEMANTIC CLASS Aggressivität nonment-dyn-abs-situation Agonie nonment-stat-abs-situation Agrarprodukt nat-discrete Ägypter human-object Ahn human-object Ahndung nonment-dyn-abs-situation Ähnlichkeit relation Airbag nonax-mov-art-discrete Airbus mov-nonanimate-con-potag Airport art-con-geogr Ajatollah human-object Akademiker human-object Akademisierung nonment-dyn-abs-situation Akkordeon nonax-mov-art-discrete Akkreditierung nonment-dyn-abs-situation Akku ax-mov-art-discrete Akquisition nonment-dyn-abs-situation Akrobat human-object semantic class
5
Characteristics of semantic classes in HaGenLex
In total 50 semantic classes for nouns are constructed from allowed combinations of: 16 semantic features (binary), e.g. HUMAN+, ARTIFICIAL- 17 ontologic sorts, e.g. concrete, abstract-situation... sort (hierarchy) semantic features semantic classes
6
Application: WOCADI-Parser
„Welche Bücher von Peter Jackson über Expertensysteme wurden bei Addison-Wesley seit 1985 veröffentlicht?“
7
Underlying Assumptions
Harris 1968: Distributional Hypothesis semantic similarity is a function over global contexts of words. The more similar the contexts, the more similar the words Projected on nouns and adjectives: nouns of similar semantic classes are modified through similar adjectives The neighbouring co-occurrence relation between adjectives as left neighbours and nouns as right neighbours approximates typical head-modifier structures
8
Neighbouring Co-occurrences and Profiles
Significant co-occurrences reflect relations between words. To determine, which are significant, a significance measure is used (here log-likelihood) In the following, we look at adjectives which appear significantly (speak: typically) left to nouns and nouns appearing significantly right of adjectives The set of adjectives that co-occur significantly often to the left of a noun is called ist adjective profile (analogous definition of noun profile for adjectives) For experiments, we use the most recent German corpus of „Projekt Deutscher Wortschatz“, 500 million tokens
9
Example: neighbouring profiles
word adjektiv / noun profile Buch neu, erschienen, erst, neuest, jüngst, gut, geschrieben, letzt, zweit, vorliegend, gleichnamig herausgegeben, nächst, dick, veröffentlicht, ... Käse gerieben, überbacken, kleinkariert, fett, französisch, fettarm, löchrig, holländisch, handgemacht, grün, würzig, selbstgemacht, produziert, schimmelig, Camembert gebacken, fettarm, reif überbacken Schweinesteak, Aubergine, Blumenkohl, Käse erlegt Tier, Wild, Reh, Stück, Beute, Großwild, Wildkatzen, Büffel, Rehbock, Beutetier, Wal, Hirsch, Hase, Grizzly, Wildschwein, Thier, Eber, Bär, Mücke, ganz Leben, Bündel, Stück, Volk, Wesen, Vermögen, Herz, Heer, Arsenal, Dorf, Land, Können, Berufsleben, Paket, Kapitel, Stadtviertel, Rudel, Jahrzehnt, ... Word transl. adjektive / noun profile translations book new, published, first, newest, most recent, recently, good, written, last, second, onhand, eponymous, next, thick, ... cheese grated, baked over, small minded, fat, French, low-fat, holey, Dutch, hand-made, green, spicey, self-made, produced, moldy camembert baken, low-fat, ripe baked over steak, aubergine, cauliflower, cheese brought down animal, game, deer, piece, prey, big game, wild cat, buffalo, roebuck, prey animal, whale, hart, bunny, grizzly, wild pig, boar, bear, ... whole life, bundle, piece, population, kind, fortune, heart, army, anrsenal, village, country, ability, career, packet, chapter, quater, pack, decade ... amount: 125‘000 nouns, 25‘000 adjectives
10
Mechanism of Inheritance
Which class is assigned to N4 in the next step? Algorithm: Initialize adjective and noun profiles; Initialize the start set; As long as new nouns get classified { calculate class probabilities for each adjective; for all yet unclassified nouns n { Multiply class probabilities per class of modifying adjectives; Assign the class with highest probabilities to n; } Class probabilities per adjective: count number of classes normalize on total number of class wrt. noun classes normalize to 1
11
Example: Topf (pot) adjektive profile of Topf (pot) = ax-mov-art-discrete: angebrannt(X) heiß(-) ehern(-) fremd(-) divers(-) zerbeult(X) brodelnd(-) staatlich(-) gußeisern(-) tönern(X) gemeinsam(-) groß(-) irden(X) verschieden(-) verschlossen(-) anonym(-) rund(-) flach(-) Bremer(-) geschlossen(-) passend(-) gesondert(-) andere(-) riesig(-) Golden(-) eisern(-) europäisch(-) viel(-) öffentlich(-) mehr(-) golden(-) leer(-) klein(-) getrennt(-) möglich(-) speziell(-) übervoll(X) dampfend(-) gleich(-) gefüllt(-) # classes per adjective: angebrannt (burnt): {nat-substance=1, art-substance=1, ax-mov-art-discrete=1} Suppe (soup) art_substance Zigarette (cigarette) ax-mov-art-discrete Milch (milk) nat-substance zerbeult (dented): {nonmov-art-discrete=1, mov-nonanimate-con-potag=2, nonax-mov-art-discrete=1, ax-mov-art-discrete=3} Wagen, Auto (wagon, car) mov-nonanimate-con-potag Fahrzeug, Mountainbike, Posaune (vehicle, mountainbike, trombone) ax-mov-art-discrete Mantel (coat) nonax-mov-art-discrete Dach (roof) nonmov-art-discrete irden (earthen): {art-con-geogr=1, nonax-mov-art-discrete=1, ax-mov-art-discrete=9} Schal (shawl) nonax-mov-art-discrete Hafen (port) art-con-geogr Teller, Flasche, Schüssel, Becher, Geschirr, Vase, Krug, Gefäß, Napf (plate, bottle, bowl, cup, dishes, vase, mug, jar) ax-mov-art-discrete tönern (clay-made): {ax-mov-art-discrete=1, prot-discrete=1} Fuß (foot) prot-discrete Gefäß (mug) ax-mov-art-discrete übervoll (over-filled): {nonmov-art-discrete=3, art-con-geogr=1, nonment-dyn-abbs-situation=1, nonax-mov-art-discrete=1} Zimmer, Saal, Lager (room, hall, encempment) nonmov-art-discrete Stall (stable) art-con-geogr Vorlesung (lecture) nonment-dyn-abs-situation Tablett (tray) nonax-mov-art-discrete Class probabilities: {mov-nonanimate-con-potag=2.8E-25, ax-mov-art-discrete=5.8E-8, art-con-geogr=1.5E-20, nonax-mov-art-discrete=2.1E-15, nat-substance=3.3E-25, nonment-dyn-abs-situation=1.6E-25, prot-discrete=5.0E-25, art-substance=3.3E-25, nonmov-art-discrete=7.1E-20}
12
Parameters Minimal number of adjectives: minAdj A noun needs at least minAdj classifying adjectives avoids statistical noise and implies frequency threshold. Maximal number of classes per adjective: maxClass An adjective is only used for classification if it favours maximally maxClass different classes unspecific adjectives do not distort the results
13
Experimental Data 4726 nouns comply to minAdj=5, that means maximal recall=78,2% In all experiments, 10-fold-cross validation was used
14
Results global classification
Classification was carried out directly on 50 semantic classes Different measuring points correspond to parameters minAdj in {5,10,15,20}, maxClass in {2, 5, 50} Results too poor for lexicon extension
15
Combining single classifiers
Architecture: binary classifiers for single features, then combinding the outcome. Parameter: minAdj=5, maxClass=2 ANIMAL +/- ANIMATE +/- Selection: compatible semantic classes that are minimal w.r.t hierarchy and unambiguous. ARTIF +/- AXIAL +/- result class or reject ... (16 features) ab +/- abs +/- ad +/- as +/- ... (17 sorts)
16
Results: single semantic features
Name Anzahl + - Bias method 6004 12 5992 0,0020 instit 6032 39 5993 0,0065 mental 9008 162 8846 0,0180 info 6015 119 5896 0,0198 animal 5995 143 5852 0,0239 geogr 188 5827 0,0313 thconc 6028 518 5510 0,0859 instru 5932 969 4963 0,1634 human 1313 4682 0,2190 legper 6009 1352 4657 0,2250 animate 6010 1505 4505 0,2504 potag 1664 4351 0,2766 artif 5864 2204 3660 0,3759 axial 5892 2260 3632 0,3836 movable 2345 3482 0,4024 spatial 6033 2910 3123 0,4823 for bias >0,05 good to excellent precision total precision: 93,8% (86,8% for feature +) total recall: 70,7% (69,2% for feature +)
17
Results: ontologic sorts
Name Anzahl + - Bias re 6033 7 6026 0,0012 mo 8 6025 0,0013 o- 5994 39 0,0065 oa 6045 41 6004 0,0068 me qn ta 107 5926 0,0177 s 6010 224 5786 0,0373 as 6031 363 5668 0,0602 na 411 5622 0,0681 at 450 5583 0,0746 io 664 5369 0,1101 ad 1481 4550 0,2456 abs 1846 4187 0,3060 d 2663 3347 0,4431 co 2910 3123 0,4823 ab- 3082 2951 0,4891 for bias >0,10 good to excellent precision total precision: 94,1% (89,5% for sort +) total recall: 73,6% (69,6% for sort +)
18
Results: comb. semantic classes
Klasse Anz. Prec Rec nonment-dyn-abs-situation 1421 89,19 34,27 human-object 1313 96,82 69,54 prot-theor-concept 516 53,71 18,22 nonoper-attribute 411 0,00 ax-mov-art-discrete 362 55,64 40,88 nonment-stat-abs-situation 226 36,84 6,19 animal-object 143 100,0 26,57 nonmov-art-discrete 133 57,41 23,31 ment-stat-abs-situation 126 51,28 15,87 nonax-mov-art-discrete 108 31,48 15,74 tem-abstractum 107 96,77 28,04 mov-nonanimate-con-potag 98 70,45 31,63 art-con-geogr 96 58,70 28,12 abs-info 94 42,31 11,70 art-substance 88 60,47 29,55 nat-discrete 31,82 nat-substance 86 57,14 9,30 prot-discrete 73 57,53 nat-con-geogr 63 65,00 20,63 prot-substance 50 40,00 mov-art-discrete 45 37,78 meas-unit 41 90,91 24,39 oper-attribute 39 Institution ment-dyn-abs-situation 36 plant-object 34 8,82 mov-nat-discrete 27 22,22 con-info 25 8,00 Rest 157 39,24 19,75 no connection between amount of class and results visible total precision: 80,2% total recall: 34,2%, number of newly classified nouns: 6649
19
Typical mistakes Pflanze (plant) animal-object instead of plant-object
zart, fleischfressend, fressend, verändert, genmanipuliert, transgen, exotisch, selten, giftig, stinkend, wachsend... Nachwuchs (offspring) human-object instead of animal-object wissenschaftlich, qualifiziert, akademisch, eigen, talentiert, weiblich, hoffnungsvoll, geeignet, begabt, journalistisch... Café (café) art-con-geogr instead of nonmov-art-discrete (cf. Restaurant) Wiener, klein, türkisch, kurdisch, romanisch, cyber, philosophisch, besucht, traditionsreich, schnieke, gutbesucht, ... Neger (negro) animal-object instead of human-object weiß, dreckig, gefangen, faul, alt, schwarz, nackt, lieb, gut, brav but: Skinhead (skinhead) human-object (ok) {16,17,18,19,20,21,22,23,30}ährig, gleichaltrig, zusammengeprügelt, rechtsradikal, brutal In most cases the wrong class is semantically close. Evaluation metrics did not account for that.
20
Any Questions? Thank you very much!
Ähnliche Präsentationen
© 2025 SlidePlayer.org Inc.
All rights reserved.