Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University
Part-of-Speech Tagging Portland has a thriving music scene. NOUN VERB DET ADJ NOUN. 2
(Nearly) Universal Part-of-Speech Tags VERBDET NOUNCONJ PRONNUM ADJPRT ADV. ADPX See Petrov, Das and McDonald (2011) 3
(Nearly) Universal Part-of-Speech Tags Example Penn Treebank tag maps: Example Spanish Treebank tag maps:
Supervised training data available for ~20 languages. (Nearly) Universal Part-of-Speech Tags Portland has a thriving music scene. NOUN VERB DET ADJ NOUN. Portland hat eine prächtig gedeihende Musikszene. NOUN VERBDETADJ NOUN. | ADP NOUN ADJ. 5
Supervised Universal POS Tagging Generalizes well for the supervised setting: average accuracy is 96.2% 6 TnT (Brants, 2000)
Resource-Poor Languages Several major languages with no or little annotated data Oriya Indonesian-Malay Azerbaijani e.g. See Haitian However, lots of parallel and unannotated data! Basic NLP tools like POS tagging essential for development of language technologies 7 Punjabi Vietnamese Polish 32 million 37 million 20 million Native speakers 7.7 million 109 million 69 million 40 million
State of the Art in Unsupervised POS Tagging 8
Unsupervised Part-of-Speech Tagging Portland hat eineprächtig gedeihende Musikszene. ?? ?? ?? ? Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm : observation sequence : state sequence 9
Unsupervised Part-of-Speech Tagging Portland hat eineprächtig gedeihende Musikszene. ?? ?? ?? ? Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm one of the 12 coarse tags : observation sequence : state sequence 10
Unsupervised Part-of-Speech Tagging Portland hat ?? Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm transition multinomials : observation sequence : state sequence 11
Unsupervised Part-of-Speech Tagging Portland hat ?? Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm emission multinomials : observation sequence : state sequence 12
Unsupervised Part-of-Speech Tagging Portland hat eineprächtig gedeihende Musikszene. ?? ?? ?? ? Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm DanishDutchGermanGreekItalianPortugueseSpanishSwedishAverage EM-HMM Poor average result 13
Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) with locally normalized log-linear models : observation sequence Portland hat ?? emission multinomials : state sequence 14 Berg-Kirkpatrick et al. (2010)
Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) with locally normalized log-linear models : observation sequence Portland hat ?? emission multinomials suffix hyphen capital letters numbers... Berg-Kirkpatrick et al. (2010) : state sequence 15
Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) with locally normalized log-linear models : observation sequence Portland hat ?? emission multinomials suffix hyphen capital letters numbers... Berg-Kirkpatrick et al. (2010) Estimated using gradient-based methods : state sequence 16
Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) with locally normalized log-linear models Portland hat ?? emission multinomials DanishDutchGermanGreekItalianPortugueseSpanishSwedishAverage EM-HMM Feature-HMM Estimated using gradient-based methods Improvements across all languages 17 Berg-Kirkpatrick et al. (2010)
18 Unsupervised POS Tagging with dictionaries
Portland hat eineprächtig gedeihende Musikszene. NOUN VERB PRON DET ADJ NUM ADJ ADV ADJ NOUN. Unsupervised POS Tagging with Dictionaries Hidden Markov Model (HMM) with locally normalized log-linear models State space constrained by possible gold tags 19
Portland hat eineprächtig gedeihende Musikszene. NOUN VERB PRON DET ADJ NUM ADJ ADV ADJ NOUN. Unsupervised POS Tagging with Dictionaries Hidden Markov Model (HMM) with locally normalized log-linear models State space constrained by possible gold tags DanishDutchGermanGreekItalianPortugueseSpanishSwedishAverage EM-HMM Feature-HMM w/ gold dictionary Average result close to supervised accuracy! 20
For most languages, access to high-quality tag dictionaries is not realistic. Ideas: 1)Use supervision in resource-rich languages 2)Use translated data 3)Construct projected tag lexicons 21 Morphologically rich languages only have base forms in dictionaries
Bilingual Projection Portland has a thriving music scene. NOUN VERB DET ADJ NOUN. automatic labels from supervised tagger, 97% accuracy 22
Bilingual Projection Portland has a thriving music scene. NOUN VERB DET ADJ NOUN. Portland hat eine prächtig gedeihende Musikszene. Automatic unsupervised alignments from translation data (available for more than 50 languages) 23
Bilingual Projection Portland has a thriving music scene. NOUN VERB DET ADJ NOUN. Portland hat eine prächtig gedeihende Musikszene. Idea 1: direct projection unaligned word NOUN (most frequent tag) 24 Yarowsky and Ngai (2001)
Bilingual Projection Idea 1: direct projection Portland hat eine prächtig gedeihende Musikszene. NOUN VERBDETNOUNADJNOUN. + more projected tagged sentences supervised training tagger 25 Yarowsky and Ngai (2001) (Brants, 2000)
Bilingual Projection Idea 1: direct projection 26 Yarowsky and Ngai (2001) DanishDutchGermanGreekItalianPortugueseSpanishSwedishAverage EM-HMM Direct projection Feature-HMM
Bilingual Projection Idea 1: direct projection consistent improvements over unsupervised models 27 Yarowsky and Ngai (2001) DanishDutchGermanGreekItalianPortugueseSpanishSwedishAverage EM-HMM Direct projection Feature-HMM
Bilingual Projection Idea 2: lexicon projection 28
Bilingual Projection Idea 2: lexicon projection NOUN Portland VERB has DET a ADJ thriving NOUN music NOUN scene.. Portlandhateine prächtiggedeihende Musikszene. 29
Bilingual Projection Idea 2: lexicon projection NOUN Portland ADJ thriving gedeihende prächtig VERB has hat DET a eine NOUN scene Musikszene NOUN music... ignore unaligned word 30
Bilingual Projection Idea 2: lexicon projection NOUN Portland ADJ thriving gedeihende VERB has hat DET a eine NOUN scene Musikszene NOUN music... Bag of alignments 31
Bilingual Projection Idea 2: lexicon projection NOUN Portland ADJ thriving gedeihende VERB has hat eine NOUN scene Musikszene NOUN music... DET a 32
Bilingual Projection Idea 2: lexicon projection NOUN Portland ADJ thriving gedeihende VERB has hat eine NOUN scene NOUN music... DET a NUM one PRON one Musikszene 33
Bilingual Projection Idea 2: lexicon projection NOUN Portland ADJ thriving gedeihende VERB has hat eine NOUN scene NOUN music... DET a NUM one PRON one Musikszene VERB thriving 34
Bilingual Projection Idea 2: lexicon projection Portland gedeihende hat eine Musikszene. After scanning all the parallel data: = probability of a tag given a word 35
Bilingual Projection Idea 2: lexicon projection Feature HMM constrained with projected dictionary Improvements over simple projection for majority of the languages 36 DanishDutchGermanGreekItalianPortugueseSpanishSwedishAverage EM-HMM Direct projection Projected Dictionary Feature-HMM
Can coverage be improved? Idea: Projected lexicon expansion and refinement using a lot of unlabeled data No information about unaligned words 37
Brief Overview: Graph-Based Learning with Labeled and Unlabeled Data 38
labeled datapoints unlabeled datapoints = symmetric weight matrix supervised label distributions to be found Zhu, Ghahramani and Lafferty,
Label Propagation Zhu, Ghahramani and Lafferty,
set of distributions over unlabeled vertices Zhu, Ghahramani and Lafferty, Label Propagation
unlabeled vertices Zhu, Ghahramani and Lafferty, Label Propagation
brings the distributions of similar vertices closer Zhu, Ghahramani and Lafferty, Label Propagation
brings the distributions of uncertain neighborhoods close to the uniform distribution Size of the label set Zhu, Ghahramani and Lafferty, Label Propagation
Iterative updates for optimization Zhu, Ghahramani and Lafferty, Label Propagation
How can label propagation help? Subramanya, Petrov and Pereira (2010) Idea 3: Graph-Based Projections 46
Example Graph in German ist gut bei ist lebhafter bei ist fein bei gutem Essen zugetan fuers Essen drauf 1000 Essen pro schlechtes Essen und zum Essen niederlassen zu realisieren, zu essen, zu stecken, zu erreichen, ist wichtig bei 47
ist gut bei ist lebhafter bei ist fein bei gutem Essen zugetan fuers Essen drauf 1000 Essen pro schlechtes Essen und zum Essen niederlassen zu realisieren, zu essen, zu stecken, zu erreichen, NOUN VERB ist wichtig bei Example Graph in German 48
How can label propagation help? 3)Plug in auto-tagged words from a source language 4)Links between source and target language units are word alignments Idea 3: Graph-Based Projections 49
Bilingual Graph ist gut bei ist lebhafter bei ist wichtig bei ist fein bei gutem Essen zugetan fuers Essen drauf 1000 Essen pro schlechtes Essen und zum Essen niederlassen zu realisieren, zu essen, zu stecken, zu erreichen, eat food eat eating NOUN VERB good ADJ fine ADJ important ADJ nicely ADV 50
How can label propagation help? For a target language: 3)Plug in auto-tagged words from a source language 4)Links between source and target language units are word alignments Idea 3: Graph-Based Projections 51
First Stage of Label Propagation ist gut bei ist lebhafter bei ist wichtig bei ist fein bei gutem Essen zugetan fuers Essen drauf 1000 Essen pro schlechtes Essen und zum Essen niederlassen zu realisieren, zu essen, zu stecken, zu erreichen, eat food eat eating NOUN VERB good ADJ nicely ADV fine ADJ important ADJ 52
ist gut bei ist lebhafter bei ist wichtig bei ist fein bei gutem Essen zugetan fuers Essen drauf 1000 Essen pro schlechtes Essen und zum Essen niederlassen zu realisieren, zu essen, zu stecken, zu erreichen, eat food eat eating NOUN VERB good ADJ nicely ADV fine ADJ important ADJ First Stage of Label Propagation 53
How can label propagation help? For a target language: 3)Plug in auto-tagged words from a source language 4)Links between source and target language units are word alignments Idea 3: Graph-Based Projections 54
ist gut bei ist lebhafter bei ist wichtig bei ist fein bei gutem Essen zugetan fuers Essen drauf 1000 Essen pro schlechtes Essen und zum Essen niederlassen zu realisieren, zu essen, zu stecken, zu erreichen, eat food eat eating NOUN VERB good ADJ nicely ADV fine ADJ important ADJ Second Stage of Label Propagation 55
ist gut bei ist lebhafter bei ist wichtig bei ist fein bei gutem Essen zugetan fuers Essen drauf 1000 Essen pro schlechtes Essen und zum Essen niederlassen zu realisieren, zu essen, zu stecken, zu erreichen, eat food eat eating NOUN VERB good ADJ nicely ADV fine ADJ important ADJ Second Stage of Label Propagation 56
ist gut bei ist lebhafter bei ist wichtig bei ist fein bei gutem Essen zugetan fuers Essen drauf 1000 Essen pro schlechtes Essen und zum Essen niederlassen zu realisieren, zu essen, zu stecken, zu erreichen, eat food eat eating NOUN VERB good ADJ nicely ADV fine ADJ important ADJ Second Stage of Label Propagation 57
ist gut bei ist lebhafter bei ist wichtig bei ist fein bei gutem Essen zugetan fuers Essen drauf 1000 Essen pro schlechtes Essen und zum Essen niederlassen zu realisieren, zu essen, zu stecken, zu erreichen, eat food eat eating NOUN VERB good ADJ nicely ADV fine ADJ important ADJ Second Stage of Label Propagation 58
ist gut bei ist lebhafter bei ist wichtig bei ist fein bei gutem Essen zugetan fuers Essen drauf 1000 Essen pro schlechtes Essen und zum Essen niederlassen zu realisieren, zu essen, zu stecken, zu erreichen, eat food eat eating NOUN VERB good ADJ nicely ADV fine ADJ important ADJ Second Stage of Label Propagation Continues till convergence... 59
feinlebhafterrealisieren Idea 3: Graph-Based Projections Portland gedeihende hat eine Musikszene. End result? 60
feinlebhafterrealisieren Idea 3: Graph-Based Projections Portland gedeihende hat eine Musikszene. 61
Idea 3: Graph-Based Projections Feature HMM constrained with graph-based dictionary DanishDutchGermanGreekItalianPortugueseSpanishSwedishAverage EM-HMM Direct projection Projected Dictionary Graph-Based Projections Feature-HMM
Idea 3: Graph-Based Projections Feature HMM constrained with graph-based dictionary DanishDutchGermanGreekItalianPortugueseSpanishSwedishAverage EM-HMM Feature-HMM Direct projection Projected Dictionary Graph-Based Projections w/ gold dictionary supervised 63
64 Idea 3: Graph-Based Projections Lexicon Expansion thousands of words
Concluding Notes Soft expansion of lexicon using parallel data and supervision in a resource rich language – Graph-based learning helps in almost all cases Reasonably accurate POS taggers without direct supervision Traditional evaluation of unsupervised POS taggers done using greedy metrics that use labeled data – Our presented models avoid these evaluation methods Practically no hyperparameter tuning – except a threshold parameter for dictionary construction 65
Future Directions Scaling up the number of nodes in the graph from 2M to billions may help create larger lexicons Including penalties in the graph objective that induce sparse tag distributions at each graph vertex Inclusion of multiple languages in the graph may further improve results – Label propagation in one huge multilingual graph 66
Portland has a thriving music scene. NOUN ADJ Portland hat eine prächtig gedeihende Musikszene. | NOUN VERB DET ADJNOUN. Portland tiene una escena musical vibrante. Portland a une scène musicale florissante. ADJ NOUN ADP. NOUN VERB DET ADJ. NOUN VERB DET ADJ NOUN VERB DET NOUN ADJ. Questions? 67