Die Präsentation wird geladen. Bitte warten

Die Präsentation wird geladen. Bitte warten

Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University.

Ähnliche Präsentationen


Präsentation zum Thema: "Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University."—  Präsentation transkript:

1 Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

2 Part-of-Speech Tagging Portland has a thriving music scene. NOUN VERB DET ADJ NOUN. 2

3 (Nearly) Universal Part-of-Speech Tags VERBDET NOUNCONJ PRONNUM ADJPRT ADV. ADPX See Petrov, Das and McDonald (2011) 3

4 (Nearly) Universal Part-of-Speech Tags Example Penn Treebank tag maps: Example Spanish Treebank tag maps:

5 Supervised training data available for ~20 languages. (Nearly) Universal Part-of-Speech Tags Portland has a thriving music scene. NOUN VERB DET ADJ NOUN. Portland hat eine prächtig gedeihende Musikszene. NOUN VERBDETADJ NOUN. | ADP NOUN ADJ. 5

6 Supervised Universal POS Tagging Generalizes well for the supervised setting: average accuracy is 96.2% 6 TnT (Brants, 2000)

7 Resource-Poor Languages Several major languages with no or little annotated data Oriya Indonesian-Malay Azerbaijani e.g. See http://www.ethnologue.org/ethno_docs/distribution.asp?by=size Haitian However, lots of parallel and unannotated data! Basic NLP tools like POS tagging essential for development of language technologies 7 Punjabi Vietnamese Polish 32 million 37 million 20 million Native speakers 7.7 million 109 million 69 million 40 million

8 State of the Art in Unsupervised POS Tagging 8

9 Unsupervised Part-of-Speech Tagging Portland hat eineprächtig gedeihende Musikszene. ?? ?? ?? ? Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm : observation sequence : state sequence 9

10 Unsupervised Part-of-Speech Tagging Portland hat eineprächtig gedeihende Musikszene. ?? ?? ?? ? Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm one of the 12 coarse tags : observation sequence : state sequence 10

11 Unsupervised Part-of-Speech Tagging Portland hat ?? Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm transition multinomials : observation sequence : state sequence 11

12 Unsupervised Part-of-Speech Tagging Portland hat ?? Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm emission multinomials : observation sequence : state sequence 12

13 Unsupervised Part-of-Speech Tagging Portland hat eineprächtig gedeihende Musikszene. ?? ?? ?? ? Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm DanishDutchGermanGreekItalianPortugueseSpanishSwedishAverage 68.757.075.965.863.762.971.568.466.7 EM-HMM Poor average result 13

14 Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) with locally normalized log-linear models : observation sequence Portland hat ?? emission multinomials : state sequence 14 Berg-Kirkpatrick et al. (2010)

15 Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) with locally normalized log-linear models : observation sequence Portland hat ?? emission multinomials suffix hyphen capital letters numbers... Berg-Kirkpatrick et al. (2010) : state sequence 15

16 Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) with locally normalized log-linear models : observation sequence Portland hat ?? emission multinomials suffix hyphen capital letters numbers... Berg-Kirkpatrick et al. (2010) Estimated using gradient-based methods : state sequence 16

17 Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) with locally normalized log-linear models Portland hat ?? emission multinomials DanishDutchGermanGreekItalianPortugueseSpanishSwedishAverage 68.757.075.965.863.762.971.568.466.7 69.165.181.371.868.178.480.270.173.0 EM-HMM Feature-HMM Estimated using gradient-based methods Improvements across all languages 17 Berg-Kirkpatrick et al. (2010)

18 18 Unsupervised POS Tagging with dictionaries

19 Portland hat eineprächtig gedeihende Musikszene. NOUN VERB PRON DET ADJ NUM ADJ ADV ADJ NOUN. Unsupervised POS Tagging with Dictionaries Hidden Markov Model (HMM) with locally normalized log-linear models State space constrained by possible gold tags 19

20 Portland hat eineprächtig gedeihende Musikszene. NOUN VERB PRON DET ADJ NUM ADJ ADV ADJ NOUN. Unsupervised POS Tagging with Dictionaries Hidden Markov Model (HMM) with locally normalized log-linear models State space constrained by possible gold tags DanishDutchGermanGreekItalianPortugueseSpanishSwedishAverage 68.757.075.965.863.762.971.568.466.7 69.165.181.371.868.178.480.270.173.0 93.194.793.596.696.494.095.885.593.7 EM-HMM Feature-HMM w/ gold dictionary Average result close to supervised accuracy! 20

21 For most languages, access to high-quality tag dictionaries is not realistic. Ideas: 1)Use supervision in resource-rich languages 2)Use translated data 3)Construct projected tag lexicons 21 Morphologically rich languages only have base forms in dictionaries

22 Bilingual Projection Portland has a thriving music scene. NOUN VERB DET ADJ NOUN. automatic labels from supervised tagger, 97% accuracy 22

23 Bilingual Projection Portland has a thriving music scene. NOUN VERB DET ADJ NOUN. Portland hat eine prächtig gedeihende Musikszene. Automatic unsupervised alignments from translation data (available for more than 50 languages) 23

24 Bilingual Projection Portland has a thriving music scene. NOUN VERB DET ADJ NOUN. Portland hat eine prächtig gedeihende Musikszene. Idea 1: direct projection unaligned word NOUN (most frequent tag) 24 Yarowsky and Ngai (2001)

25 Bilingual Projection Idea 1: direct projection Portland hat eine prächtig gedeihende Musikszene. NOUN VERBDETNOUNADJNOUN. + more projected tagged sentences supervised training tagger 25 Yarowsky and Ngai (2001) (Brants, 2000)

26 Bilingual Projection Idea 1: direct projection 26 Yarowsky and Ngai (2001) DanishDutchGermanGreekItalianPortugueseSpanishSwedishAverage 68.757.075.965.863.762.971.568.466.7 69.165.181.371.868.178.480.270.173.0 73.677.083.279.379.782.680.174.778.8 EM-HMM Direct projection Feature-HMM

27 Bilingual Projection Idea 1: direct projection consistent improvements over unsupervised models 27 Yarowsky and Ngai (2001) DanishDutchGermanGreekItalianPortugueseSpanishSwedishAverage 68.757.075.965.863.762.971.568.466.7 69.165.181.371.868.178.480.270.173.0 73.677.083.279.379.782.680.174.778.8 EM-HMM Direct projection Feature-HMM

28 Bilingual Projection Idea 2: lexicon projection 28

29 Bilingual Projection Idea 2: lexicon projection NOUN Portland VERB has DET a ADJ thriving NOUN music NOUN scene.. Portlandhateine prächtiggedeihende Musikszene. 29

30 Bilingual Projection Idea 2: lexicon projection NOUN Portland ADJ thriving gedeihende prächtig VERB has hat DET a eine NOUN scene Musikszene NOUN music... ignore unaligned word 30

31 Bilingual Projection Idea 2: lexicon projection NOUN Portland ADJ thriving gedeihende VERB has hat DET a eine NOUN scene Musikszene NOUN music... Bag of alignments 31

32 Bilingual Projection Idea 2: lexicon projection NOUN Portland ADJ thriving gedeihende VERB has hat eine NOUN scene Musikszene NOUN music... DET a 32

33 Bilingual Projection Idea 2: lexicon projection NOUN Portland ADJ thriving gedeihende VERB has hat eine NOUN scene NOUN music... DET a NUM one PRON one Musikszene 33

34 Bilingual Projection Idea 2: lexicon projection NOUN Portland ADJ thriving gedeihende VERB has hat eine NOUN scene NOUN music... DET a NUM one PRON one Musikszene VERB thriving 34

35 Bilingual Projection Idea 2: lexicon projection Portland gedeihende hat eine Musikszene. After scanning all the parallel data: = probability of a tag given a word 35

36 Bilingual Projection Idea 2: lexicon projection Feature HMM constrained with projected dictionary Improvements over simple projection for majority of the languages 36 DanishDutchGermanGreekItalianPortugueseSpanishSwedishAverage 68.757.075.965.863.762.971.568.466.7 69.165.181.371.868.178.480.270.173.0 73.677.083.279.379.782.680.174.778.8 79.078.882.476.384.887.082.879.481.3 EM-HMM Direct projection Projected Dictionary Feature-HMM

37 Can coverage be improved? Idea: Projected lexicon expansion and refinement using a lot of unlabeled data No information about unaligned words 37

38 Brief Overview: Graph-Based Learning with Labeled and Unlabeled Data 38

39 labeled datapoints unlabeled datapoints 0.9 0.01 0.8 0.9 0.1 = symmetric weight matrix supervised label distributions to be found Zhu, Ghahramani and Lafferty, 2003 39

40 0.9 0.01 0.8 0.9 0.1 Label Propagation Zhu, Ghahramani and Lafferty, 2003 40

41 0.9 0.01 0.8 0.9 0.1 set of distributions over unlabeled vertices Zhu, Ghahramani and Lafferty, 2003 41 Label Propagation

42 0.9 0.01 0.8 0.9 0.1 unlabeled vertices Zhu, Ghahramani and Lafferty, 2003 42 Label Propagation

43 0.9 0.01 0.8 0.9 0.1 brings the distributions of similar vertices closer Zhu, Ghahramani and Lafferty, 2003 43 Label Propagation

44 0.9 0.01 0.8 0.9 0.1 brings the distributions of uncertain neighborhoods close to the uniform distribution Size of the label set Zhu, Ghahramani and Lafferty, 2003 44 Label Propagation

45 0.9 0.01 0.8 0.9 0.1 Iterative updates for optimization Zhu, Ghahramani and Lafferty, 2003 45 Label Propagation

46 How can label propagation help? Subramanya, Petrov and Pereira (2010) Idea 3: Graph-Based Projections 46

47 Example Graph in German ist gut bei ist lebhafter bei ist fein bei gutem Essen zugetan fuers Essen drauf 1000 Essen pro schlechtes Essen und zum Essen niederlassen zu realisieren, zu essen, zu stecken, zu erreichen, ist wichtig bei 47

48 ist gut bei ist lebhafter bei ist fein bei gutem Essen zugetan fuers Essen drauf 1000 Essen pro schlechtes Essen und zum Essen niederlassen zu realisieren, zu essen, zu stecken, zu erreichen, NOUN VERB ist wichtig bei Example Graph in German 48

49 How can label propagation help? 3)Plug in auto-tagged words from a source language 4)Links between source and target language units are word alignments Idea 3: Graph-Based Projections 49

50 Bilingual Graph ist gut bei ist lebhafter bei ist wichtig bei ist fein bei gutem Essen zugetan fuers Essen drauf 1000 Essen pro schlechtes Essen und zum Essen niederlassen zu realisieren, zu essen, zu stecken, zu erreichen, eat food eat eating NOUN VERB good ADJ fine ADJ important ADJ nicely ADV 50

51 How can label propagation help? For a target language: 3)Plug in auto-tagged words from a source language 4)Links between source and target language units are word alignments Idea 3: Graph-Based Projections 51

52 First Stage of Label Propagation ist gut bei ist lebhafter bei ist wichtig bei ist fein bei gutem Essen zugetan fuers Essen drauf 1000 Essen pro schlechtes Essen und zum Essen niederlassen zu realisieren, zu essen, zu stecken, zu erreichen, eat food eat eating NOUN VERB good ADJ nicely ADV fine ADJ important ADJ 52

53 ist gut bei ist lebhafter bei ist wichtig bei ist fein bei gutem Essen zugetan fuers Essen drauf 1000 Essen pro schlechtes Essen und zum Essen niederlassen zu realisieren, zu essen, zu stecken, zu erreichen, eat food eat eating NOUN VERB good ADJ nicely ADV fine ADJ important ADJ First Stage of Label Propagation 53

54 How can label propagation help? For a target language: 3)Plug in auto-tagged words from a source language 4)Links between source and target language units are word alignments Idea 3: Graph-Based Projections 54

55 ist gut bei ist lebhafter bei ist wichtig bei ist fein bei gutem Essen zugetan fuers Essen drauf 1000 Essen pro schlechtes Essen und zum Essen niederlassen zu realisieren, zu essen, zu stecken, zu erreichen, eat food eat eating NOUN VERB good ADJ nicely ADV fine ADJ important ADJ Second Stage of Label Propagation 55

56 ist gut bei ist lebhafter bei ist wichtig bei ist fein bei gutem Essen zugetan fuers Essen drauf 1000 Essen pro schlechtes Essen und zum Essen niederlassen zu realisieren, zu essen, zu stecken, zu erreichen, eat food eat eating NOUN VERB good ADJ nicely ADV fine ADJ important ADJ Second Stage of Label Propagation 56

57 ist gut bei ist lebhafter bei ist wichtig bei ist fein bei gutem Essen zugetan fuers Essen drauf 1000 Essen pro schlechtes Essen und zum Essen niederlassen zu realisieren, zu essen, zu stecken, zu erreichen, eat food eat eating NOUN VERB good ADJ nicely ADV fine ADJ important ADJ Second Stage of Label Propagation 57

58 ist gut bei ist lebhafter bei ist wichtig bei ist fein bei gutem Essen zugetan fuers Essen drauf 1000 Essen pro schlechtes Essen und zum Essen niederlassen zu realisieren, zu essen, zu stecken, zu erreichen, eat food eat eating NOUN VERB good ADJ nicely ADV fine ADJ important ADJ Second Stage of Label Propagation 58

59 ist gut bei ist lebhafter bei ist wichtig bei ist fein bei gutem Essen zugetan fuers Essen drauf 1000 Essen pro schlechtes Essen und zum Essen niederlassen zu realisieren, zu essen, zu stecken, zu erreichen, eat food eat eating NOUN VERB good ADJ nicely ADV fine ADJ important ADJ Second Stage of Label Propagation Continues till convergence... 59

60 feinlebhafterrealisieren Idea 3: Graph-Based Projections Portland gedeihende hat eine Musikszene. End result? 60

61 feinlebhafterrealisieren Idea 3: Graph-Based Projections Portland gedeihende hat eine Musikszene. 61

62 Idea 3: Graph-Based Projections Feature HMM constrained with graph-based dictionary DanishDutchGermanGreekItalianPortugueseSpanishSwedishAverage 68.757.075.965.863.762.971.568.466.7 69.165.181.371.868.178.480.270.173.0 73.677.083.279.379.782.680.174.778.8 79.078.882.476.384.887.082.879.481.3 83.279.582.882.586.887.984.280.583.4 EM-HMM Direct projection Projected Dictionary Graph-Based Projections Feature-HMM

63 Idea 3: Graph-Based Projections Feature HMM constrained with graph-based dictionary DanishDutchGermanGreekItalianPortugueseSpanishSwedishAverage 68.757.075.965.863.762.971.568.466.7 69.165.181.371.868.178.480.270.173.0 73.677.083.279.379.782.680.174.778.8 79.078.882.476.384.887.082.879.481.3 83.279.582.882.586.887.984.280.583.4 EM-HMM Feature-HMM Direct projection Projected Dictionary Graph-Based Projections 93.194.793.596.696.494.095.885.593.7 w/ gold dictionary 96.994.998.297.895.897.296.894.896.6 supervised 63

64 64 Idea 3: Graph-Based Projections Lexicon Expansion thousands of words

65 Concluding Notes Soft expansion of lexicon using parallel data and supervision in a resource rich language – Graph-based learning helps in almost all cases Reasonably accurate POS taggers without direct supervision Traditional evaluation of unsupervised POS taggers done using greedy metrics that use labeled data – Our presented models avoid these evaluation methods Practically no hyperparameter tuning – except a threshold parameter for dictionary construction 65

66 Future Directions Scaling up the number of nodes in the graph from 2M to billions may help create larger lexicons Including penalties in the graph objective that induce sparse tag distributions at each graph vertex Inclusion of multiple languages in the graph may further improve results – Label propagation in one huge multilingual graph 66

67 Portland has a thriving music scene. NOUN ADJ Portland hat eine prächtig gedeihende Musikszene. | NOUN VERB DET ADJNOUN. Portland tiene una escena musical vibrante. Portland a une scène musicale florissante. ADJ NOUN ADP. NOUN VERB DET ADJ. NOUN VERB DET ADJ NOUN VERB DET NOUN ADJ. Questions? 67


Herunterladen ppt "Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University."

Ähnliche Präsentationen


Google-Anzeigen