Die Präsentation wird geladen. Bitte warten

Die Präsentation wird geladen. Bitte warten

GI/ACM Regionalgruppe Rhein-Main Suche ist nicht gleich Suche! 23. Juni 2016 in den Ra ̈ umen des Fraunhofer IGD in Darmstadt Chris Biemann

Ähnliche Präsentationen


Präsentation zum Thema: "GI/ACM Regionalgruppe Rhein-Main Suche ist nicht gleich Suche! 23. Juni 2016 in den Ra ̈ umen des Fraunhofer IGD in Darmstadt Chris Biemann"—  Präsentation transkript:

1 GI/ACM Regionalgruppe Rhein-Main Suche ist nicht gleich Suche! 23. Juni 2016 in den Ra ̈ umen des Fraunhofer IGD in Darmstadt Chris Biemann Adaptive Methoden in der Sprachtechnologie

2 2 Elemente des Cognitive Computing ss context- ualized iterativeadaptive inter- active Cognitive Computi ng

3 3 Warum Sprache schwer ist Er saß auf der Bank und zählte seine Kohle. Sie ging zur Bank und hob Geld ab. lexikalische Ebene Konzept- ebene synonym polysem

4 4 Why Not Only To Use Dictionaries or Ontologies Advantages:  Sense inventory given  Linking to concepts  Full control Photo by zeh fernando under Creative Commons licence “give a man a fish and you feed him for a day… Disadvantages: Dictionaries have to be created Dictionaries are incomplete Language changes constantly: new words, new meanings …

5 5 Structure Discovery Paradigm … teach a man to fish and you feed him for a lifetime” Consequences:  Only raw text input required  Corpus-driven  Language/domain independent Machine Learning Task Use annotations as features Text Data SD Algorithms Find regularities Annotate regularities in data

6 6 CORPUS-ADAPTIVE SEMANTICS Machine Learning Task Use annotations as features Text Data SD Algorithms Find regularities Annotate regularities in data

7 7 ‘holing’ operation: producing pairs of words and features fing er)1 Fisch fing)1 Fisch den)1 Fisch im)1 im Netz)1 Netz fing er)1 Fisch fing)1 Fisch den)1 Fisch im)1 im Netz)1 Netz C. Biemann, M. Riedl (2013): Text: Now in 2D! A Framework for Lexical Expansion with Contextual Similarity. Journal of Language Modelling 1(1): SB(fing, er) OA(Fisch, fing) NK(Fisch, den) MNR(Fisch, im) NK(im, Netz) --(Netz,.)

8 8 Distributional Thesaurus (DT)  Computed from distributional similarity statistics  Entry for a target word consists of a ranked list of neighbors Netz#NN1000 Netzwerk#NN102 Stromnetz#NN69 Infrastruktur#NN61 Geflecht#NN58 Mobilfunknetz#NN55 Schienennetz#NN52 Angebot#NN46 Streckennetz#NN45 System#NN42 Internet#NN42 Datennetz#NN39 Vernetzung#NN38 Festnetz#NN37 Faden#NN37 Telefonnetz#NN37... äußern1000 sprechen#VV368 warnen#VV344 betonen#VV330 erklären#VV300 bekräftigen#VV281 plädieren#VV274 sagen#VV272 kündigen#VV266 mahnen#VV265 kritisieren#VV264 verweisen#VV255 räumen#VV241 reagieren#VV komplett Abschaffung#NN#-NK vollständig Renovierung#NN#-NK Verzicht#NN#-NK Genesung#NN#-NK First order komplett vollständig Second order 3 Spielzeit#NN#-NK Z. Harris. (1954): Distributional Structure. Word 10 (2/3) G. A. Miller, W. G. Charles (1991): Contextual Correlates of Semantic Similarity. Language and Cognitive Processes 1991, 6 (1) 1-28 D. Lin (1998): Automatic retrieval and clustering of similar words, in Proceedings of COLING ’98, pp. 768–774 Unsinn#NN#-NK

9 9 DT entry “paper#NN” with contexts xx

10 10 Clustering of DT entries: Sense Induction bright#JJ paper#NN C. Biemann (2006): Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems. Proceedings of the HLT-NAACL-06 Workshop on Textgraphs-06, New York, USA.

11 11 Symbolic Distributional Model example “beetle” Biemann, C. and Riedl, M. (2013): Text: Now in 2D! A Framework for Lexical Expansion with Contextual Similarity. Journal of Language Modeling 1(1):55-95

12 12 CORPUS- AND TEXT-ADAPTIVE SEMANTICS Machine Learning Task Use annotations as features Text Data SD Algorithms Find regularities Annotate regularities in data

13 13 2D-Text im Frage-Antwort-Szenario T. Miller, C. Biemann, T. Zesch, I. Gurevych (2012): Using Distributional Similarity for Lexical Expansion in Knowledge ‐ based Word Sense Disambiguation. Proceedings of COLING ‐ 12, Mumbai, India

14 14 Semantic Enterprise Search: MWU and Senses dd Phrases: Apple mouse, mouse genome, USB mouse, cat and mouse.. Senses: Animal like rat, rodent, pig Device like keyboard, joystick

15 15 Contextualization: Virus on Twitter  Setman Virus Programs and Generator #VirusPrograms&Generator  Zeus Trojan Virus Found On Facebook  Virus Graveyard. sav in Borderlands 2 ( Xbox 360 ): Borderlands 2 für die Xbox 360 ist von einem Virus namens... #Telmi  Virus stiehlt gezielt technische Zeichnungen  Chinesische Behörden weiten Kampf gegen Virus H7N9 aus - Newsticker - Die aktuellsten Nachrichten - News - Bild.de  Freut mich, dass ich einfach so viele von meinen Freunden wieder mit dem Lego Virus infiziert habe :D  Infektiologie : Der unterschätzte Virus – #Herpes hat viele Gesichter Danke :) Magen-Darm Virus waren gestern zur kontrolle im Krankenhaus :o Super Weihnachten :/ Trojaner Malware Schadprogramm Datei Schadsoftware Hacker Schadcode Sicherheitslücke Spyware Spam Schwachstelle Programm Rootkit Software Phishing Dokument s Tool Dialer Exploit Code Keylogger Hacker Seuche Krankheit Infektion Vogelgrippe Grippe Epidemie Tumor Vogelgrippevirus Erkrankung Infektionskrankheit Schweinegrippe Fieber Lungenentzündung Tierseuche Hiv Vogelgrippe-Virus Viruserkrankung Virustyp Lungenkrankheit Variante Pest Aids Influenza

16 16 ADAPTIVE APPLICATIONS Adaptive Machine Learning Use annotations as features Text Data SD Algorithms Find regularities Annotate regularities in data

17 17 WebAnno Automation Mode dd Yimam, S.M., Biemann, C., Majnaric, L., Šabanović, Š., Holzinger, A. (2015): Interactive and Iterative Annotation for Biomedical Entity Recognition, International Conference on Brain Informatics and Health (BIH’15), London, UK

18 18 After Annotating 5 Abstracts xx

19 19 After Annotating 9 More Abstracts xx

20 20 Adaptive Writing Aid: Paraphrasing  offer paraphrases from various sources in a text editor  improve through usage: train system on user’s signals

21 21 Investigative Data-Driven Journalism dd

22 22 Conclusion  Adaptive Natural Language Processing  makes use of static AND dynamically generated resources  is driven by (text) data that defines its application domain  loops the user into the equation  beyond NLP pipelines “Surely, it will be hard to understand such a system in detail. But who would want to meticulously control every piece of such a system, when one can simply let it emerge?”

23 23 Thank you for yourand your

24 24 Size matters # words k 10k100k 1M 10M100M 1G 10G100G Brown Gigaword BNC CCAECPSAE Susanne Wacky Wikipedia Encarta Google Books ClueWeb 1T EuroParl language separation POS induction twitter2011 sense induction morphology induction language ID POS tagging word sense dis. Morphology SemCor two-dimensional text Topic Segmentation


Herunterladen ppt "GI/ACM Regionalgruppe Rhein-Main Suche ist nicht gleich Suche! 23. Juni 2016 in den Ra ̈ umen des Fraunhofer IGD in Darmstadt Chris Biemann"

Ähnliche Präsentationen


Google-Anzeigen