Die Präsentation wird geladen. Bitte warten

Die Präsentation wird geladen. Bitte warten

© 2002 Hans Uszkoreit Vorlesung Einführung in die Computerlinguistik Teil 2 Sprachtechnologie Hans Uszkoreit Universität des Saarlandes und Deutsches Forschungszentrum.

Ähnliche Präsentationen


Präsentation zum Thema: "© 2002 Hans Uszkoreit Vorlesung Einführung in die Computerlinguistik Teil 2 Sprachtechnologie Hans Uszkoreit Universität des Saarlandes und Deutsches Forschungszentrum."—  Präsentation transkript:

1 © 2002 Hans Uszkoreit Vorlesung Einführung in die Computerlinguistik Teil 2 Sprachtechnologie Hans Uszkoreit Universität des Saarlandes und Deutsches Forschungszentrum für Künstliche Intelligenz

2 © 2002 Hans Uszkoreit VL CL Überblick Aufgaben und Probleme der Sprachtechnologie Gesprochene Sprache Texttechnologien Ausblick

3 © 2002 Hans Uszkoreit VL CL W AS IST S PRACHTECHNOLOGIE Eigentlich eine Klasse von Technologien in der Informations- technologie, die Wissen über die Struktur des menschlichen Sprachen verwenden, um die maschinelle Verarbeitung der Sprache zu ermöglichen bzw. zu verbessern. Beispiel: Microsoft Word verarbeitet zwar Sprache, enthält aber im Kern nur sehr wenig Sprachtechnologie. Sprachtechnologie steckt aber in der Erkennung von Satzgrenzen für die Formatierung, in der automatischen Silbentrennung, in der Rechtschreibkontrolle und in der Grammatikkontrolle. Nach Meinung der führenden Experten in der Computerindustrie ist die Sprachtechnologie eine Schlüsseltechnologie für den weiteren Fortschritt in der Computertechnik.

4 © 2002 Hans Uszkoreit VL CL Akzeptanz Das Hauptproblem für die Akzeptanz des Computers ist das Sprachproblem Der Standardanwender beherrscht keine Computersprachen. Der Standardanwender mag Computersprachen nicht. Der Standardanwender will auch keine Computersprachen lernen. Die Sprache, die der Mensch bestens beherrscht, ist seine Muttersprache. Das natürlichste Medium für die unmittelbare Übermittlung von Information ist die gesprochene Sprache. Die wichtigste Klasse von Daten sind Texte. Der Standardanwender verwendet die Maschine zur Produktion von Texten in menschlicher Sprache. Computer tun sich schwer in der Verarbeitung und Verwaltung von Texten Aber der Computer beherrscht die menschliche Sprache nicht!

5 © 2002 Hans Uszkoreit VL CL phonetische Verarbeitungorthographische Verarbeitung morphonologische Verarbeitung syntaktische Verarbeitung semantische Verarbeitung pragmatische Verarbeitung - Wissensverarbeitung akustische Form geschriebene Form morphonologische Repräsentation phonetische o. graphemische Repräsentation syntaktische Repräsentation semantische Repräsentation Repräsentation der vollen Bedeutung Textverstehen

6 © 2002 Hans Uszkoreit VL CL akustische Form geschriebene Form morphonologische Repräsentation phonetische o. graphemische Repräsentation syntaktische Repräsentation semantische Repräsentation Repräsentation der vollen Bedeutung Diktat das Boot auf dem Main oder daß bot auf dem mein phonetische Verarbeitungorthographische Verarbeitung morphonologische Verarbeitung syntaktische Verarbeitung semantische Verarbeitung pragmatische Verarbeitung - Wissensverarbeitung

7 © 2002 Hans Uszkoreit VL CL akustische Form geschriebene Form morphonologische Repräsentation phonetische o. graphemische Repräsentation syntaktische Repräsentation semantische Repräsentation Repräsentation der vollen Bedeutung Maschinelle Übersetzung phonetische Verarbeitungorthographische Verarbeitung morphonologische Verarbeitung syntaktische Verarbeitung semantische Verarbeitung pragmatische Verarbeitung - Wissensverarbeitung

8 © 2002 Hans Uszkoreit VL CL Performanzkriterien efficiency geringer Zeit- und Speicherbedarf accuracyFähigkeit, linguistisch korrekte Lösungen zu liefern robustnessFähigkeit, mit allen möglichen Eingaben fertigzuwerden coveragegrößtmögliche Abdeckung der Sprache specificityFähigkeit, die richtige Analyse zu selegieren

9 © 2002 Hans Uszkoreit VL CL Dimensionen des Problems Sprachumfang Sprachtiefe Gegenstandsbereich(e) Morphologie Syntax Semantik Pragmatik Wissensverarbeitung Worterkennung Das Problem der Sprachbeherrschung ist zu komplex Es gibt viele Anwendungen, die nur begrenzte Sprachbeherrschung benötigen!

10 © 2002 Hans Uszkoreit VL CL Grenzen der Technologie Der Computer wird so bald nicht sprechen und schreiben wie wir KEINE SPRACHBEHERRSCHUNG OHNE ALLGEMEINES WISSEN Wörterbücher und Grammatiken können wir formalisieren Bei der Semantik wird es schon schwerer Dialog kann nur recht einfach modelliert werden Begrenztes Domänenwissen ist machbar Beim Allgemeinwissen und großem Fachwissen hört es auf Wir müssen das Problem begrenzen, um zu vernünftigen Anwendungen zu gelangen

11 © 2002 Hans Uszkoreit VL CL S TAND DER K UNST Weder Sprachverstehen noch Sprachproduktion sind bisher gelöst. Aber: Wir besitzen heute sogenannte flache Verfahren, die zwar kein Verstehen ermöglichen, aber für viele Anwendungen oft völlig ausreichen. flache Ansätze (effizient und robust) statistische Methoden, Mustergrammatiken tiefe Ansätze (präzise und korrekt) linguistische Prinzipien, Constraints oder komplexe Regelwerke

12 Diese Anwendung ist die bisher meistverkaufte Sprachanwendung Sprachumfang Sprachtiefe Gegenstandsbereich(e) Lexikon Morphologie Syntax Semantik Pragmatik Wissensverarbeitung Rechtschreibkontrolle

13 Hier beginnt das Geschäft erst gerade GRAMMATIKKONTROLLE Sprachumfang Sprachtiefe Gegenstandsbereich(e) Lexikon Morphologie Syntax Semantik Pragmatik Wissensverarbeitung Grammatikkontrolle

14 Der Bedarf kommt mit der akustischen Spracherkennung EINFACHE ABFRAGESYSTEME Sprachumfang Sprachtiefe Gegenstandsbereich(e) Worterkennung Morphologie Syntax Semantik Pragmatik Wissensverarbeitung

15 Sprachumfang Sprachtiefe Gegenstandsbereich(e)

16 © 2002 Hans Uszkoreit VL CL Speech Technologies speaker recognition language verification command recognition speech-to-text speech translation spoken dialogue systems text-to-speech concept-to-speech report generation text generation Text Technologies indexing summarization categorization information extraction spell checking grammar checking text translation abstracting written dialogue systems Language Technologies

17 © 2002 Hans Uszkoreit VL CL Probleme Die Sprache hat Seiten, die dem Menschen leichtfallen, dem Computer hingegen schwer. Insbesondere: Ambiguität: viele Wörter und Phrasen haben mehrere Bedeutungen Paraphrasen: es gibt viele Möglichkeiten, das Gleiche auszudrücken Ungenauigkeit: oft ist die Bedeutung von Ausdrücken unscharf

18 © 2002 Hans Uszkoreit VL CL Speech Technologies Spoken Dialogue Systems Speech Translation Systems Voice Recognition/ Speaker Identification Language Indentification Speech Verification Speech Recognition Voice Modelling Speech Synthesis Speech Production

19 © 2002 Hans Uszkoreit VL CL Speech Technology Applications Voice Control Systems Dication Systems Text-to-Speech Systems Identification and Verification Systems Information Access Spoken Dialogue Systems Speech Translation Systems

20 © 2002 Hans Uszkoreit VL CL Call Center Applications Call Routing Dialogues Simple Information Dialogues Information Access by Call Center Agents Speech-Synthesis of Information for Customer Retrieval of Recorded Calls Text Technologies for Information Retrieval Text Technologies for Information Fusion/reporting

21 © 2002 Hans Uszkoreit VL CL SL (D) S Eigenschaften und Kriterien Vokabular (vorgegeben und erweiterbar) Sprecherabhängigkeit (Trainingsaufwand) Einzelwort vs. kontinuierlich Spontansprache Sprachmodell Initiative Barge-in Bandweite und Eingabegeräte Archivierung

22 © 2002 Hans Uszkoreit VL CL Systeme zum Ausprobieren Nuance's Travel Plan Demo (Fluginformation) Firma: Nuance Corporation (Technologie von SRI International) Tel.: URL: Flugverbindungen zwischen 250 Städten in den USA PureSpeech Travel Planning Firma: PureSpeech Inc. Tel.: Info zu 850 Reisezielen Noch kein echter Datenbankzugriff Philips Intercity Zugauskunft Firma: Philips Research Laboratories Deutschland (Testsystem) Tel.: Schweiz (im Einsatz bei den SBB): Tel.:

23 © 2002 Hans Uszkoreit VL CL Text Technologies Written Dialogue Systems Text Translation Systems Language Indentification Information Retrieval Document Categorization Document Clustering Text Summarization Information Extraction Spell Checking Grammar/Style Checking Abstract Generation Report Generation Text Generation Document Production

24 © 2002 Hans Uszkoreit VL CL Text Technology Applications Spell Checkers Machine-Assisted Human Translation Indicative Machine Translation Grammar Checkers Human Assisted Machine Translation High Quality Text Translation Text Generation Systems

25 © 2002 Hans Uszkoreit VL CL Heutige Suchtechnologie Wort-Index Boolsche Kombinationen verschiedene Indexierungsverfahren eingeschränkte Morphologie Sortierung nach Relevanz Suche in mehreren Sprachen More than 35 terabytes served surch the web for: LycaSurcha First 10 of matches: 1. research in mutlilingual IR an easier... Order your free beer today sweat AND tears 2. Sir Winston S. Churchill homepage of s Rock Timeline remember the best Shinjuku Yamabuki B$h$&$3$=;%5!

26 © 2002 Hans Uszkoreit VL CL Suche Sie finden nicht genug! Andere Wortformen der Herzog, des Herzogs, die Herzöge Unter- und Überbegriffe Alfa Romeo Zagato roadster sports car car motor vehicle vehicle Paraphrasen steuerliche Gründe, Steuergründe, steuerliche Erwägungen, steuerliche Überlegungen, fiskalische Erwägungen, um Steuern zu sparen,...

27 © 2002 Hans Uszkoreit VL CL Suche Nehmen wir an, Sie suchten nach Automobilfirmen und gäben daher der Suchmaschine (z.B. HOTBOT) den Suchbegriff Automobilfirmen Im Englischen suchten Sie nach: automobile companies

28 © 2002 Hans Uszkoreit VL CL Suche automobile companies 704 Automobilfirmen 55

29 © 2002 Hans Uszkoreit VL CL Suche automobile companies 704 car builders 233 car makers1846 auto makers2307 automobile makers 181 car companies3046 cars companies 14 motor companies 194 auto companies1345 car manufacturers3056 motor manufacturers 582 automobile manufacturers4263 manufacturers of cars 151 manufacturers of autos 15 manufacturers of automobiles 165 manufacturers of motor vehicles 55 Automobilfirmen 55 Autohersteller 320 Autobauer 131 Autoproduzenten 26 Autofabrikant 89 Autofirmen 86 Pkw Hersteller 15 Automobilunternehmen 57 Automobilhersteller 602 Kfz-Hersteller 42 Autounternehmen 9 Automobilkonzerne 83 Unternehmen der Automobilbranche 4 Hersteller von Autos 4 Hersteller von Automobilen 13 Hersteller von Kraftfahrzeugen 3

30 © 2002 Hans Uszkoreit VL CL Suche Sie finden zu viel! Ambiguität deutsch: Zug, Bahn, Leitung, Schalter deutsch: Zug, Bahn, Leitung, Schalter englisch: terminal, line, engine englisch: terminal, line, enginePolysemie Buch, Schule, printer Buch, Schule, printerEigennamen Personennamen: Maurer, Washington, Chase Personennamen: Maurer, Washington, Chase Ortsbezeichnungen: Essen, Halle, Bismarck Ortsbezeichnungen: Essen, Halle, Bismarck

31 © 2002 Hans Uszkoreit VL CL Die Aufgabe des Informationsmanagements ist die Verwaltung und Nutzbarmachung von sehr großen Informationsmengen, wie wie sie heute bereits auf dem WWW, in Intranets und in großen Text- Datenbanken finden. Das Netz macht sie erst einmal nur verfügbar. Im Gegensatz zu herkömmlichen Datenbanken ist die Information viel weniger vorstrukturiert (in Sinne der Strukturierung von Computerdaten). Auf der anderen Seiten sind die relevanten inhaltlichen Strukturen natürlich weitaus komplexer. Durch die Digitalisierung von großen Teilen des menschlichen Wissen (z.B. digitale Bibiliotheken, Filmarchive etc.) wird dieses Problem noch zunehmen. A UFGABEN DES IM

32 © 2002 Hans Uszkoreit VL CL P ROBLEME DES I NFORMATION M ANAGEMENT Distributivität Die Information liegt auf verschiedenen Maschinen Heterogenität Vielzahl von Dokumentformaten Multilingualität Multimedialität (z.B. Sprache, Bilder, Klänge), Multimodalität (z.B. geschr. u. gesprochene Sprache, Filmdateien o. Realzeitübertragungen) Unstrukturiertheit keine einheitliche Klassifikation, keine einheitliche interne Strukturierung. keine einheitliche u. verläßliche Hypertextverknüpfung Redundanz Viele Informationen sind mehrfach vorhanden.

33 © 2002 Hans Uszkoreit VL CL I NFORMATION M ANAGEMENT Information wird gewonnen kategorisiert gefiltert zusammengeführt strukturiert dem Benutzer zugeführt adäquat präsentiert

34 © 2002 Hans Uszkoreit VL CL S PRACHTECHNOLOGIEN FÜR DAS I&WM Sammeln (gathering) Indizieren (indexing) Kategorisierung (categorization) Gruppierung (clustering) Zusammenfassung (summarization) Informationsextraktion (information extraction) Automatische Verknüpfung (automatic hyperlinking) Datenschürfen (text data mining) Informationsfusion (information fusion) Berichtsgenerierung (report generation) Textübersetzung (text translation)

35 © 2002 Hans Uszkoreit VL CL I NFORMATIONSGEWINNUNG Sammeln (gathering) Data Mining auch Text Mining Konversion z.B. Einscannen, OCR, Transkription Agenten z.B. NetBots, WebBots

36 © 2002 Hans Uszkoreit VL CL S TRUKTURIERUNG UND S PEICHERUNG Indizieren (indexing) Kategorisierung (categorization) Gruppierung (clustering) Zusammenfassung (summarization)

37 © 2002 Hans Uszkoreit VL CL I NFORMATIONSAUFBEREITUNG Informationsextraktion (information extraction) Hyperverknüpfung (hyperlinking) Informationsfusion (information fusion) Trendanalyse (trend analysis) Berichtsgenerierung (report generation)

38 © 2002 Hans Uszkoreit VL CL I NFORMATIONSZUGRIFF Suchschlüsselerweiterung (query expansion) Relevanzsortierung (relevance ranking) Dublettenerkennung (redundancy check) thematische Gruppierung (thematic clustering) Erkennung verwandter Information (information association)

39 © 2002 Hans Uszkoreit VL CL P RÄSENTATION Ergebnispräsentation (result presentation) Informationsvisualisierung (information visualization) virtuelle Navigation (virtual navigation)

40 © 2002 Hans Uszkoreit VL CL I NFORMATIONSEXTRAKTION Robuste Extraktion von relevanten Begriffen, Phrasen, Aussagen aus Texten. Erfolgsraten (Vollständigkeit und Präzision) hängen von der Aufgabe und vom Gegenstandsbereich ab. Bereits eingesetzt in verschiedenen Anwendungen, z.B. für p Firmennamenerkennung, p Nachrichtenkategorisierung, p Übersichten zu Firmenindikatoren (Umsatz, Gewinn, Kurse) p Nachrichtenübersichten zu speziellen Themenbereichen

41 © 2002 Hans Uszkoreit VL CL Informationsextraktion In der IE werden gezielt relevante Informationen aus Texten herausgesucht und strukturiert. Bremen, , wiwo: Lagersoftware weiter im Aufwind Die Bremer Firma Trade Consult hat auf einer Pressekonferenz in Hannover die Version 2.0 ihrer erfolgreichen Lagerverwal-tungssoftware Store Age vorgestellt.. Die neue Version ermöglicht jetzt auch... Auf der Pressekonferenz gab Geschäftsführer Franz Merleback auch die Umsatzzahlen der Softwareschmiede für das 3. Quartal bekannt. Wurden im zweiten Quartal bereits über 30 Millionen Mark umgesetzt, so konnte Merleback jetzt das stolze Ergebnis von 42,5 Millionen verkünden....

42 © 2002 Hans Uszkoreit VL CL Informationsextraktion In der IE werden gezielt relevante Informationen aus Texten herausgesucht und strukturiert. Bremen, , wiwo: Lagersoftware weiter im Aufwind Die Bremer Firma Trade Consult hat auf einer Pressekonferenz in Hannover die Version 2.0 ihrer erfolgreichen Lagerverwal-tungssoftware Store Age vorgestellt.. Die neue Version ermöglicht jetzt auch... Auf der Pressekonferenz gab Geschäftsführer Franz Merleback auch die Umsatzzahlen der Softwareschmiede für das 3. Quartal bekannt. Wurden im zweiten Quartal bereits über 30 Millionen Mark umgesetzt, so konnte Merleback jetzt das stolze Ergebnis von 42,5 Millionen verkünden....

43 © 2002 Hans Uszkoreit VL CL B EISPIEL: I NFORMATIONSEXTRAKTION (2) Firma 96Q Q197Q297Q3 97Q Diff ComSoft 120Mio 110Mio Trade Consult 30 Mio 42,5Mio Z&M 71,0Mio

44 © 2002 Hans Uszkoreit VL CL Crosslingual Information Retrieval (CLIR) multilinguale Suche multilinguale Schnittstelle für die Navigation mehrsprachiges Angebot im Web

45 © 2002 Hans Uszkoreit VL CL MULINEX System

46 © 2002 Hans Uszkoreit VL CL Suchschlüssel-Eingabe

47 © 2002 Hans Uszkoreit VL CL Anfrageassistent

48 © 2002 Hans Uszkoreit VL CL Scope Classical Areas of Computational Linguistics: computational morphology, computational syntax computational semantics computational pragmatics Text Applications of Language Technology: indexing categorization summarization information extraction report generation

49 © 2002 Hans Uszkoreit VL CL Different Goals Classical Goal: understanding and production of text Die Bremer Firma Trade Consult hat auf einer Pressekonferenz in Hannover die Version 2.0 ihrer erfolgreichen Lagerverwaltungssoftware Store Age vorgestellt. Die neue Version ermöglicht jetzt auch die zentrale Verwaltung mehrerer Lager und integriert die Lagerhaltung in das Supply Chain Management auf der Basis von SAP Software. Auf der Pressekonferenz gab Geschäftsführer Franz Merleback auch die Umsatzzahlen der Softwareschmiede für das 3.Quartal bekannt. Wurden im zweiten Quartal bereits über 30 Millionen Mark umgesetzt, so konnte Merleback jetzt das stolze Ergebnis von 42,5 Millionen verkünden. Die neue Version ermöglicht jetzt auch die zentrale Verwaltung mehrerer Lager und integriert die Lagerhaltung in das Supply Chain Management auf der Basis von SAP Software.. n Lagerverwaltungssoftware erfolgreichen Lagerverwaltungssoftware Store Age

50 © 2002 Hans Uszkoreit VL CL Different Goals Classical Goal: understanding and production of text Die Bremer Firma Trade Consult hat auf einer Pressekonferenz in Hannover die Version 2.0 ihrer erfolgreichen Lagerverwaltungssoftware Store Age vorgestellt. Die neue Version ermöglicht jetzt auch die zentrale Verwaltung mehrerer Lager und integriert die Lagerhaltung in das Supply Chain Management auf der Basis von SAP Software. Auf der Pressekonferenz gab Geschäftsführer Franz Merleback auch die Umsatzzahlen der Softwareschmiede für das 3.Quartal bekannt. Wurden im zweiten Quartal bereits über 30 Millionen Mark umgesetzt, so konnte Merleback jetzt das stolze Ergebnis von 42,5 Millionen verkünden. Die neue Version ermöglicht jetzt auch die zentrale Verwaltung mehrerer Lager und integriert die Lagerhaltung in das Supply Chain Management auf der Basis von SAP Software.. n Lagerverwaltungssoftware erfolgreichen Lagerverwaltungssoftware Store Age

51 © 2002 Hans Uszkoreit VL CL Different Goals Classical Goal: understanding and production of text highly accurate and comprehensive in depth could be used by automatic inferencing but lacking efficiency, robustness, coverage

52 © 2002 Hans Uszkoreit VL CL Classical Goal: understanding and production of text Goals of Text Technologies recognition of relevant elements or generation of short passages from DB entries Die Bremer Firma Trade Consult hat auf einer Pressekonferenz in Hannover die Version 2.0 ihrer erfolgreichen Lagerverwaltungssoftware Store Age vorgestellt. Die neue Version ermöglicht jetzt auch die zentrale Verwaltung mehrerer Lager und integriert die Lagerhaltung in das Supply Chain Management auf der Basis von SAP Software. Auf der Pressekonferenz gab Geschäftsführer Franz Merleback auch die Umsatzzahlen der Softwareschmiede für das 3.Quartal bekannt. Wurden im zweiten Quartal bereits über 30 Millionen Mark umgesetzt, so konnte Merleback jetzt das stolze Ergebnis von 42,5 Millionen verkünden. Die neue Version ermöglicht jetzt auch die zentrale Verwaltung mehrerer Lager und integriert die Lagerhaltung in das Supply Chain Management auf der Basis von SAP Software. Different Goals building an index

53 © 2002 Hans Uszkoreit VL CL Classical Goal: understanding and production of text Goals of Text Technologies recognition of relevant elements or generation of short passages from DB entries Die Bremer Firma Trade Consult hat auf einer Pressekonferenz in Hannover die Version 2.0 ihrer erfolgreichen Lagerverwaltungssoftware Store Age vorgestellt. Die neue Version ermöglicht jetzt auch die zentrale Verwaltung mehrerer Lager und integriert die Lagerhaltung in das Supply Chain Management auf der Basis von SAP Software. Auf der Pressekonferenz gab Geschäftsführer Franz Merleback auch die Umsatzzahlen der Softwareschmiede für das 3.Quartal bekannt. Wurden im zweiten Quartal bereits über 30 Millionen Mark umgesetzt, so konnte Merleback jetzt das stolze Ergebnis von 42,5 Millionen verkünden. Die neue Version ermöglicht jetzt auch die zentrale Verwaltung mehrerer Lager und integriert die Lagerhaltung in das Supply Chain Management auf der Basis von SAP Software. Different Goals Trade Consult Umsatzzahlen extracting the topic

54 © 2002 Hans Uszkoreit VL CL Classical Goal: understanding and production of text Goals of Text Technologies recognition of relevant elements or generation of short passages from DB entries Different Goals Firma 96Q Q1 97Q2 97Q3 97Q Diff Hahnemann 105 Mio110Mio Trade Consult 30 Mio42,5Mio Z&M 12,0Mio 14 Mio extracting relations Die Bremer Firma Trade Consult hat auf einer Pressekonferenz in Hannover die Version 2.0 ihrer erfolgreichen Lagerverwaltungssoftware Store Age vorgestellt. Die neue Version ermöglicht jetzt auch die zentrale Verwaltung mehrerer Lager und integriert die Lagerhaltung in das Supply Chain Management auf der Basis von SAP Software. Auf der Pressekonferenz gab Geschäftsführer Franz Merleback auch die Umsatzzahlen der Softwareschmiede für das 3.Quartal bekannt. Wurden im zweiten Quartal bereits über 30 Millionen Mark umgesetzt, so konnte Merleback jetzt das stolze Ergebnis von 42,5 Millionen verkünden. Die neue Version ermöglicht jetzt auch die zentrale Verwaltung mehrerer Lager und integriert die Lagerhaltung in das Supply Chain Management auf der Basis von SAP Software.

55 © 2002 Hans Uszkoreit VL CL Information Extraction Bremen, , wiwo: Lagersoftware weiter im Aufwind Die Bremer Firma Trade Consult hat auf einer Pressekonferenz in Hannover die Version 2.0 ihrer erfolgreichen Lagerverwaltungssoftware Store Age vorgestellt. Die neue Version ermöglicht jetzt auch die zentrale Verwaltung mehrerer Lager und integriert die Lagerhaltung in das Supply Chain Management auf der Basis von SAP Software. Auf der Pressekonferenz gab Geschäftsführer Franz Merleback auch die Umsatzzahlen der Softwareschmiede für das 3.Quartal bekannt. Wurden im zweiten Quartal bereits über 30 Millionen Mark umgesetzt, so konnte Merleback jetzt das stolze Ergebnis von 42,5 Millionen verkünden.

56 © 2002 Hans Uszkoreit VL CL IE Result Firma 96Q Q1 97Q2 97Q3 97Q Diff ComSoft 120Mio 110Mio Trade Consult 30 Mio 42,5Mio Z&M 71,0Mio

57 © 2002 Hans Uszkoreit VL CL Classical Goal: understanding and production of text Goals of Text Technologies recognition of relevant elements or generation of short passages from DB entries Die Bremer Firma Trade Consult hat auf einer Pressekonferenz in Hannover die Version 2.0 ihrer erfolgreichen Lagerverwaltungssoftware Store Age vorgestellt. Die neue Version ermöglicht jetzt auch die zentrale Verwaltung mehrerer Lager und integriert die Lagerhaltung in das Supply Chain Management auf der Basis von SAP Software. Auf der Pressekonferenz gab Geschäftsführer Franz Merleback auch die Umsatzzahlen der Softwareschmiede für das 3.Quartal bekannt. Wurden im zweiten Quartal bereits über 30 Millionen Mark umgesetzt, so konnte Merleback jetzt das stolze Ergebnis von 42,5 Millionen verkünden. Die neue Version ermöglicht jetzt auch die zentrale Verwaltung mehrerer Lager und integriert die Lagerhaltung in das Supply Chain Management auf der Basis von SAP Software. Different Goals Firma 96Q Q1 97Q2 97Q3 97Q Diff ComSoft 120Mio 110Mio Trade Consult 30 Mio 42,5Mio Z&M71,0 Mio 88,0 Mio extracting relations

58 © 2002 Hans Uszkoreit VL CL Classical Goal: understanding and production of text Goals of Text Technologies recognition of relevant elements or generation of short passages from DB entries robust and efficient support for human inferencing but shallow -- do not get to the contents -- lacking accuracy Different Goals

59 © 2002 Hans Uszkoreit VL CL Application and Foundation Deep MethodsShallow Methods Application Foundation

60 © 2002 Hans Uszkoreit VL CL Empirical Methodology formal methods algorithmic methods empirical methods availability of large electronic corpora computational tools for handling large sets of data increased computing power means for data interpretation

61 © 2002 Hans Uszkoreit VL CL Statistical Methods deep processing shallow processing categorization deep parsing with semantic construction deep parsing PS parsing shallow parsing summarization inf. extraction answer extraction statistical methods symbolic methods

62 © 2002 Hans Uszkoreit VL CL Statistical Methods deep processing shallow processing categorization deep parsing with semantic construction deep parsing PS parsing shallow parsing summarization inf. extraction answer extraction statistical methods symbolic methods hybrid methods

63 © 2002 Hans Uszkoreit VL CL Corpus-Based Methods Corpus-based statistical methods are especially relevant for: acquisition of grammar and lexicon acquisition and modelling of soft constraints acquisition and modelling of performance preferences However, we need linguistically interpreted corpora.

64 © 2002 Hans Uszkoreit VL CL Combinig Shallow and Deep Three ways of combining shallow and deep processing: shallow processing as a preprocessor for deep processing deep processing as a servant to shallow processing deep processing techniques are integrated into shallow processing

65 © 2002 Hans Uszkoreit VL CL Information Extraction Instead of extraction - Enrichment of texts by structural information Structuring of information through IE technology Transformation of unstructured text into semi-structured documents Application in document conversion to XML and in XML document authoring

66 © 2002 Hans Uszkoreit VL CL A Continuum from Shallow to Deep Topic Recognition Terminology Recognition Named Entity Recognition Simple Relation Recognition Complex Relation Recognition (template filling) Answer Recognition Information Fusion - Template Merging

67 © 2002 Hans Uszkoreit VL CL Performance Modelling in the Past Coverage large scale HPSG grammar development in several languages lexical work on the morphological and syntactic side first steps towards learning from corpora Robustness robust semantic processing with underspecification work on soft constraints and preferences Efficiency efficient HPSG and DG processing efficiency in semantic processing by ambiguity reduction

68 © 2002 Hans Uszkoreit VL CL Weitere Probleme Sie finden zu viel! Ambiguität deutsch: Zug, Bahn, Leitung, Schalter englisch: terminal, line, engine Polysemie Buch, Schule, printer Eigennamen Personennamen: Maurer, Washington, Chase Ortsbezeichnungen: Essen, Halle, Bismarck

69 © 2002 Hans Uszkoreit VL CL Das Web ist mutlilingual Das WWW war anfangs vorherrschend monolingual Das WWW war anfangs vorherrschend monolingual ( % aller WWW Seiten englisch) Nicht-englische Inhalte nehmen schneller zu. ( % englisch, heute ca. 85%)

70 © 2002 Hans Uszkoreit VL CL G LOBALIZATION OF THE U SER B ASE Share of US Web Users US Web Users in % Source: Computer Industry Almanac Inc. January

71 © 2002 Hans Uszkoreit VL CL Relevante Faktoren Entwicklung vom Avantgardemedium zum Massenmedium Entwicklung vom Avantgardemedium zum Massenmedium Ausbreitung in neue Regionen (Lateinamerika, Asien, arabische Welt) Ausbreitung in neue Regionen (Lateinamerika, Asien, arabische Welt) Digitalisierung großer Bibliotheken in vielen Ländern Digitalisierung großer Bibliotheken in vielen Ländern Rolle des WWW als globaler Handelsplatz Rolle des WWW als globaler Handelsplatz Rolle des WWW als Medium für politische Information und Propaganda Rolle des WWW als Medium für politische Information und Propaganda Zunahme sozialer und kultureller Inhalte Zunahme sozialer und kultureller Inhalte Die Zukunft des WWW ist vielsprachig.

72 © 2002 Hans Uszkoreit VL CL Noch mehr Probleme! Andere Schriftsysteme müssen kodiert und dargestellt werden: Chinesisch, Japanisch, Arabisch, Griechisch,... Die Wortbildungsregeln der Sprachen geraten sich ins Gehege: Skatskating Limeslime Sprachübergreifende Ambiguität stört bei der Suche: Briefbrief overview Postpost messages Porto Porto travel information HautHaut Barr cutecute girls

73 © 2002 Hans Uszkoreit VL CL Multilingualität als Herausforderung Eine große Chance tut sich auf: Es wird möglich sein, durch das niedergeschriebene Wissen der Menschheit zu navigieren, ohne an der Sprachgrenze stehenbleiben zu müssen. Diese technologische Herausforderung erfordert aber Fortschritte auf den folgenden Gebieten: lexikalische Semantik konzeptuelle Strukturierung Verbesserungen in maschineller Übersetzung

74 © 2002 Hans Uszkoreit VL CL Sprache im WWW Sprache ist nur ein Medium auf dem WWW. Aber unter den verschiedenen Medien hat die Sprache einen besonderen Status. Bücher, Filme, Bilder, Musikstücke und Computerprogramme beschreiben und finden wir am besten mit Sprache. Nur mithilfe der Sprache können wir Wissen strukturieren und sinnvoll vernetzen. p Die Sprache ist das Gewebe des World Wide Web

75 © 2002 Hans Uszkoreit VL CL Menschliche Sprache Die Sprache hat Seiten, die dem Menschen leichtfallen, dem Computer hingegen schwer. Insbesondere: Ambiguität: viele Wörter und Phrasen haben mehrere Bedeutungen Paraphrasen: es gibt viele Möglichkeiten, das Gleiche auszudrücken Ungenauigkeit: oft ist die Bedeutung von Ausdrücken unscharf

76 © 2002 Hans Uszkoreit VL CL Maschinelle Übersetzung Die vollautomatische maschinelle Übersetzung (fully automatic maschine translation – FAMT) beliebiger Texte ist heute nicht möglich. Das liegt nicht an der linguistischen Verarbeitung der Texte, sondern am fehlenden Wissen der Maschine über die Inhalte. Für sehr eingeschränkte Gegenstandsbereiche und Textarten können aber brauchbare Übersetzungen geliefert werden. Ansonsten dient die maschinelle Übersetzung heute erfolgreich als Vorstufe für menschliche Übersetzung (machine-assisted human translation – MAHT).

77 © 2002 Hans Uszkoreit VL CL MÜ ist dennoch brauchbar Eine zufriedenstellende automatische Übersetzung beliebiger Texte ist heute also nicht möglich. Aber die Technologie liefert Übersetzungen, die den Leser sehr wohl das Thema und die wesentlichsten Inhalte erkennen lassen. Wir arbeiten mit dem Übersetzungssystem LOGOS. Andere große Übersetzungssysteme (SYSTRAN, METAL) werden ebenfalls für WWW Anwendungen eingesetzt. Die Übersetzungen nennen wir indikative Übersetzungen.

78 © 2002 Hans Uszkoreit VL CL Indikative Übersetzung

79 © 2002 Hans Uszkoreit VL CL Multilinguale Navigation multilinguale Suche multilinguale Schnittstelle für die Navigation mehrsprachiges Angebot im Web

80 © 2002 Hans Uszkoreit VL CL car Konzeptindex

81 © 2002 Hans Uszkoreit VL CL car Personenauto Auto Automobil Konzeptindex

82 © 2002 Hans Uszkoreit VL CL car Personenauto Auto Automobil Konzeptindex...Kraftfahrzeuge für Personen...

83 © 2002 Hans Uszkoreit VL CL car Personenauto Auto Automobil automobile auto car Konzeptindex...Kraftfahrzeuge für Personen...

84 © 2002 Hans Uszkoreit VL CL car Personenauto Auto Automobil automobile auto car Konzeptindex...location de voitures......Kraftfahrzeuge für Personen...

85 © 2002 Hans Uszkoreit VL CL Konzeptindex motor vehicle cartruck sports car...Kraftfahrzeuge für Personen... tank truck Personenauto Auto Automobil automobile auto car...location de voitures...

86 © 2002 Hans Uszkoreit VL CL Konzeptindex Sprachidentifikation Lexikalische Desambiguierung Flache syntaktische Analysetechniken Aufbau eines phrasalen Index multilinguale Terminologien Paraphrasen-Glossare BenötigteTechnologien:

87 © 2002 Hans Uszkoreit VL CL Informationsextraktion In der IE werden gezielt relevante Informationen aus Texten herausgesucht und strukturiert. Bremen, , wiwo: Lagersoftware weiter im Aufwind Die Bremer Firma Trade Consult hat auf einer Pressekonferenz in Hannover die Version 2.0 ihrer erfolgreichen Lagerverwaltungssoftware Store Age vorgestellt.. Die neue Version ermöglicht jetzt auch... Auf der Pressekonferenz gab Geschäftsführer Franz Merleback auch die Umsatzzahlen der Softwareschmiede für das 3. Quartal bekannt. Wurden im zweiten Quartal bereits über 30 Millionen Mark umgesetzt, so konnte Merleback jetzt das stolze Ergebnis von 42,5 Millionen verkünden....

88 © 2002 Hans Uszkoreit VL CL Informationsextraktion In der IE werden gezielt relevante Informationen aus Texten herausgesucht und strukturiert. Bremen, , wiwo: Lagersoftware weiter im Aufwind Die Bremer Firma Trade Consult hat auf einer Pressekonferenz in Hannover die Version 2.0 ihrer erfolgreichen Lagerverwaltungssoftware Store Age vorgestellt.. Die neue Version ermöglicht jetzt auch... Auf der Pressekonferenz gab Geschäftsführer Franz Merleback auch die Umsatzzahlen der Softwareschmiede für das 3. Quartal bekannt. Wurden im zweiten Quartal bereits über 30 Millionen Mark umgesetzt, so konnte Merleback jetzt das stolze Ergebnis von 42,5 Millionen verkünden....

89 © 2002 Hans Uszkoreit VL CL Ausgabe in tabellarischer Form Firma96Q Q197Q297Q397Q41997Diff. ComSoft120Mio110Mio ComSoft120Mio110Mio Trade Consult30 Mio42,5Mio Z&M71,0Mio Z&M71,0Mio

90 © 2002 Hans Uszkoreit VL CL Ausblick Die Strukturierung des digitalen menschlichen Wissens ist eine der großen Herausforderungen des nächsten Jahrhunderts. Die Sprachtechnologie ist eine Schlüsseltechnologie für dieses ehrgeizige Vorhaben, denn die Sprache ist das Gewebe des Wissens.

91 © 2002 Hans Uszkoreit VL CL Ziele Verbesserung der Informationsvernetzung durch automatisches Erstellen, Einfügen und Verwalten von getypten Hyperlinks in WWW- Dokumenten Methode Hyperlinks (Anchors und Targets) werden unabhängig von Dokumenten gespeichert und verwaltet Mithilfe von modernen Methoden der Sprachtechnologie werden in WWW-Dokumente Stellen (Begriffe, Textteile) identifiziert, die einem vordefinierten Anchor entsprechen Mit diesem Anchor wird offline oder online der entsprechende Hyperlink assoziiert und in das Dokument automatisch eingefügt, Der Typ des Target (z.B. Begriffesdefinition, Homepage, Hintergrundinformation, Bildmaterial, etc.) wird bereits in dem Ausgangsdokument markiert

92 © 2002 Hans Uszkoreit VL CL T HE O NE -C LICK A PPROACH New wireless voice technology introduced Posted at 5:09 PM PT, Feb 8, 1999 By Stephen Lawson, InfoWorld Electric NTT Labs on Monday brought Dick Tracy into the enterprise, introducing a wireless voice and data system that can use a wrist radio at the Demo 99 conference. AirWave technology, demonstrated for the first time in the United States at this week's confe- rence in Indian Wells, Calif., is based on a wireless PBX. Small, handheld phones -- and a wrist radio that looks like an oversized watch -- can be used to make voice calls and exchange data around a building or campus. The handheld phones can be switched to a public cellular mode to become conventional cell phones. Company representatives touted the system as offering higher voice quality than a typical PBX. Airwave is based on NTT's Personal Handyphone System, which is currently deployed by more than 600 users in Japan, according to the company. Modems built in to both devices allow users to plug in a notebook or portable device for dial-up data connections as fast as 64Kbps. Users can exchange files or , or access a LAN or the Internet. There is no airtime charge for AirWave communications in the building or campus. AirWave systems are scheduled to be available through distribution partners by the end of this year, priced as low as $400 per user. NTT Labs, the research and development arm of NTT Corp., in Tokyo, can be reached at

93 © 2002 Hans Uszkoreit VL CL T HE O NE -C LICK A PPROACH New wireless voice technology introduced Posted at 5:09 PM PT, Feb 8, 1999 By Stephen Lawson, InfoWorld Electric NTT Labs on Monday brought Dick Tracy into the enterprise, introducing a wireless voice and data system that can use a wrist radio at the Demo 99 conference. AirWave technology, demonstrated for the first time in the United States at this week's confe- rence in Indian Wells, Calif., is based on a wireless PBX. Small, handheld phones -- and a wrist radio that looks like an oversized watch -- can be used to make voice calls and exchange data around a building or campus. The handheld phones can be switched to a public cellular mode to become conventional cell phones. Company representatives touted the system as offering higher voice quality than a typical PBX. Airwave is based on NTT's Personal Handyphone System, which is currently deployed by more than 600 users in Japan, according to the company. Modems built in to both devices allow users to plug in a notebook or portable device for dial-up data connections as fast as 64Kbps. Users can exchange files or , or access a LAN or the Internet. There is no airtime charge for AirWave communications in the building or campus. AirWave systems are scheduled to be available through distribution partners by the end of this year, priced as low as $400 per user. NTT Labs, the research and development arm of NTT Corp., in Tokyo, can be reached at

94 © 2002 Hans Uszkoreit VL CL T HE O NE -C LICK A PPROACH New wireless voice technology introduced Posted at 5:09 PM PT, Feb 8, 1999 By Stephen Lawson, InfoWorld Electric NTT Labs on Monday brought Dick Tracy into the enterprise, introducing a wireless voice and data system that can use a wrist radio at the Demo 99 conference. AirWave technology, demonstrated for the first time in the United States at this week's confe- rence in Indian Wells, Calif., is based on a wireless PBX. Small, handheld phones -- and a wrist radio that looks like an oversized watch -- can be used to make voice calls and exchange data around a building or campus. The handheld phones can be switched to a public cellular mode to become conventional cell phones. Company representatives touted the system as offering higher voice quality than a typical PBX. Airwave is based on NTT's Personal Handyphone System, which is currently deployed by more than 600 users in Japan, according to the company. Modems built in to both devices allow users to plug in a notebook or portable device for dial-up data connections as fast as 64Kbps. Users can exchange files or , or access a LAN or the Internet. There is no airtime charge for AirWave communications in the building or campus. AirWave systems are scheduled to be available through distribution partners by the end of this year, priced as low as $400 per user. NTT Labs, the research and development arm of NTT Corp., in Tokyo, can be reached at

95 © 2002 Hans Uszkoreit VL CL New wireless voice technology introduced Posted at 5:09 PM PT, Feb 8, 1999 By Stephen Lawson, InfoWorld Electric NTT Labs on Monday brought Dick Tracy into the enterprise, introducing a wireless voice and data system that can use a wrist radio at the Demo 99 conference. AirWave technology, demonstrated for the first time in the United States at this week's confe- rence in Indian Wells, Calif., is based on a wireless PBX. Small, handheld phones -- and a wrist radio that looks like an oversized watch -- can be used to make voice calls and exchange data around a building or campus. The handheld phones can be switched to a public cellular mode to become conventional cell phones. Company representatives touted the system as offering higher voice quality than a typical PBX. Airwave is based on NTT's Personal Handyphone System, which is currently deployed by more than 600 users in Japan, according to the company. Modems built in to both devices allow users to plug in a notebook or portable device for dial-up data connections as fast as 64Kbps. Users can exchange files or , or access a LAN or the Internet. There is no airtime charge for AirWave communications in the building or campus. AirWave systems are scheduled to be available through distribution partners by the end of this year, priced as low as $400 per user. NTT Labs, the research and development arm of NTT Corp., in Tokyo, can be reached at T HE O NE -C LICK A PPROACH

96 © 2002 Hans Uszkoreit VL CL T HE O NE -C LICK A PPROACH New wireless voice technology introduced Posted at 5:09 PM PT, Feb 8, 1999 By Stephen Lawson, InfoWorld Electric NTT Labs on Monday brought Dick Tracy into the enterprise, introducing a wireless voice and data system that can use a wrist radio at the Demo 99 conference. AirWave technology, demonstrated for the first time in the United States at this week's confe- rence in Indian Wells, Calif., is based on a wireless PBX. Small, handheld phones -- and a wrist radio that looks like an oversized watch -- can be used to make voice calls and exchange data around a building or campus. The handheld phones can be switched to a public cellular mode to become conventional cell phones. Company representatives touted the system as offering higher voice quality than a typical PBX. Airwave is based on NTT's Personal Handyphone System, which is currently deployed by more than 600 users in Japan, according to the company. Modems built in to both devices allow users to plug in a notebook or portable device for dial-up data connections as fast as 64Kbps. Users can exchange files or , or access a LAN or the Internet. There is no airtime charge for AirWave communications in the building or campus. AirWave systems are scheduled to be available through distribution partners by the end of this year, priced as low as $400 per user. NTT Labs, the research and development arm of NTT Corp., in Tokyo, can be reached at Company Info Homepage Other News Products Indicators Contact Experts Contacts Accounts Company Info Homepage Other News Products Indicators Contact Experts Contacts Accounts

97 © 2002 Hans Uszkoreit VL CL Ziele und Vorteile Information kann dichter vernetzt werden, da Links nicht mehr manuell eingefügt werden müssen Verwaltung von Internet-Sites und Linkstrukturen kann wesentlich vereinfacht und automatisiert werden Konsistenz von Vernetzungen wird erhöht, da jeder Link nur an einer Stelle verwaltet werden muss Typisierung von Links führt zu erhöhter Transparenz für den Benutzer, der bereits am Link erkennen kann, wo dieser hinführt. Neue Qualität der Vernetzung führt zu einer neuen Qualität des Informations- und Wissensmanagement

98 © 2002 Hans Uszkoreit VL CL Umfeld und Einbettung Internet-Applikationen, Informations- und Wissensmanagement sind zentrale Kompetenzen des DFKI LT-Labs Die Entwicklung innovative Merkmale und Funktionalitäten ist für die Bewahrung der führenden Position im Bereich mehrsprachige Internetportale, Suchmaschinen und Informationssysteme essentiell Beispielanwendungen: HYPERCODE (Dresdner Bank) Dichte automatische Vernetzung von Programmcode und Dokumentation MIETTA (Mehrsprachiges WWW-Tourismus-Informationsystem) Automatische Vernetzung von Tourismusinformation Beispiel: Ortsnamen etc. in WWW-Dokumenten werden automatisch mit Homepages der Kommunen verbunden

99 © 2002 Hans Uszkoreit VL CL D AS P ROBLEM Sprachumfang Sprachtiefe Gegenstandsbereich(e) Lexikon Morphologie Syntax Semantik Pragmatik Wissensverarbeitung

100 © 2002 Hans Uszkoreit VL CL D AS P ROBLEM Sprachumfang Sprachtiefe Gegenstandsbereich(e) Lexikon Morphologie Syntax Semantik Pragmatik Wissensverarbeitung Rechtschreibkontrolle

101 © 2002 Hans Uszkoreit VL CL D AS P ROBLEM Sprachumfang Sprachtiefe Gegenstandsbereich(e) Informationsextraktion Lexikon Morphologie Syntax Semantik Pragmatik Wissensverarbeitung

102 © 2002 Hans Uszkoreit VL CL akustische Form geschriebene Form morphonologische Repräsentation phonetische o. graphemische Repräsentation syntaktische Repräsentation semantische Repräsentation Repräsentation der vollen Bedeutung Maschinelle Übersetzung phonetische Verarbeitungorthographische Verarbeitung morphonologische Verarbeitung syntaktische Verarbeitung semantische Verarbeitung pragmatische Verarbeitung - Wissensverarbeitung

103 © 2002 Hans Uszkoreit VL CL Systeme in der Anwendung Systran (Systran, EU) Metal/Comprendium (Siemens, Sietec, L&H, SAIL Labs) Logos (Logos, Global Words) Personal Translator (IBM)

104 © 2002 Hans Uszkoreit VL CL Maschinelle Übersetzung Die vollautomatische maschinelle Übersetzung (fully automatic maschine translation – FAMT) beliebiger Texte ist heute nicht möglich. Die vollautomatische maschinelle Übersetzung (fully automatic maschine translation – FAMT) beliebiger Texte ist heute nicht möglich. Das liegt nicht an der linguistischen Verarbeitung der Texte, sondern am fehlenden Wissen der Maschine über die Inhalte. Das liegt nicht an der linguistischen Verarbeitung der Texte, sondern am fehlenden Wissen der Maschine über die Inhalte. Für sehr eingeschränkte Gegenstandsbereiche und Textarten können aber brauchbare Übersetzungen geliefert werden. Für sehr eingeschränkte Gegenstandsbereiche und Textarten können aber brauchbare Übersetzungen geliefert werden. Ansonsten dient die maschinelle Übersetzung heute erfolgreich als Vorstufe für menschliche Übersetzung (machine-assisted human translation – MAHT). Ansonsten dient die maschinelle Übersetzung heute erfolgreich als Vorstufe für menschliche Übersetzung (machine-assisted human translation – MAHT).

105 © 2002 Hans Uszkoreit VL CL coverage of subject domains and text sorts quality perfect ready-to-use understandable FAMT M ACHINE T RANSLATION T ODAY

106 © 2002 Hans Uszkoreit VL CL coverage of subject domains and text sorts quality perfect ready-to-use understandable MAHT FAMT M ACHINE T RANSLATION T ODAY

107 © 2002 Hans Uszkoreit VL CL coverage of subject domains and text sorts quality perfect ready-to-use understandable C ONTROLLED L ANGUAGE MT M ACHINE T RANSLATION T ODAY

108 © 2002 Hans Uszkoreit VL CL coverage of subject domains and text sorts quality perfect ready-to-use understandable indicative FAMT M ACHINE T RANSLATION T ODAY

109 © 2002 Hans Uszkoreit VL CL C ONCEPT I NDEX car

110 © 2002 Hans Uszkoreit VL CL Personenauto Auto Automobil C ONCEPT I NDEX car

111 © 2002 Hans Uszkoreit VL CL...Kraftfahrzeuge für Personen... C ONCEPT I NDEX car Personenauto Auto Automobil

112 © 2002 Hans Uszkoreit VL CL automobileautocar...Kraftfahrzeuge für Personen... car C ONCEPT I NDEX Personenauto Auto Automobil

113 © 2002 Hans Uszkoreit VL CL car...location de voitures......Kraftfahrzeuge für Personen... C ONCEPT I NDEX Personenauto Auto Automobil automobileautocar

114 © 2002 Hans Uszkoreit VL CL motor vehicle truck sports car...Kraftfahrzeuge für Personen... tank truck...location de voitures... car C ONCEPT I NDEX Personenauto Auto Automobil automobileautocar

115 © 2002 Hans Uszkoreit VL CL L ANGUAGE T ECHNOLOGIES Language Technologies Speech TechnologiesText Technologies

116 © 2002 Hans Uszkoreit VL CL L ANGUAGE T ECHNOLOGIES Language Technologies Speech TechnologiesText Technologies gathering indexing categorization clustering summarization

117 © 2002 Hans Uszkoreit VL CL L ANGUAGE T ECHNOLOGIES Language Technologies Speech TechnologiesText Technologies text understanding text translation information extraction report generation

118 © 2002 Hans Uszkoreit VL CL L ANGUAGE T ECHNOLOGIES Language Technologies Speech TechnologiesText Technologies Voice Recognition Speech Verification Speech Recognition Voice Modelling Speech Synthesis Speaker Identification Language Indentification

119 © 2002 Hans Uszkoreit VL CL L ANGUAGE T ECHNOLOGIES Language Technologies Speech TechnologiesText Technologies Speech Generation Speech Unterstanding Spoken Dialogue Systems Speech Translation Systems

120 © 2002 Hans Uszkoreit VL CL L ANGUAGE T ECHNOLOGIES Language Technologies Speech TechnologiesText Technologies language understanding language generation dialogue modelling machine translation

121 © 2002 Hans Uszkoreit VL CL L ANGUAGE T ECHNOLOGIES Language Technologies Speech TechnologiesText Technologies gathering indexing categorization clustering summarization

122 © 2002 Hans Uszkoreit VL CL A CQUISITION Scanning Collecting by and Push Services Gathering from the Net Sound Recordings

123 © 2002 Hans Uszkoreit VL CL Hypertext, in computer science, a metaphor for presenting information in which text, images, sounds, and actions become linked together in a complex, nonsequential web of associations that permit the user to browse through related topics, regardless of the presented order of the topics. These links are often established both by the author of a hypertext document and by the user, depending on the intent of the hypertext document. For example, traveling among the links to the word iron in an article might lead the user to the periodic table of the elements or a map of the migration of metallurgy in Iron Age Europe. The term hypertext was coined in 1965 by Ted Nelson to describe documents, as presented by a Computer, that express the nonlinear structure of ideas, as opposed to the linear format of books, film, and speech. The term hypermedia, more recently introduced, is nearly synonymous but emphasizes the nontextual components of hypertext, such as animation, recorded sound, and video. Hypermedia, in computer science, the integration of graphics, sound, video, or any combination into a primarily associative system of information storage and retrieval. Hypermedia, especially in an interactive format where choices are controlled by the user, is structured around the idea of offering a working and learning environment that parallels human thinking˘that is, an environment that allows the user to make associations between topics rather than move sequentially from one to the next, as in an alphabetic list. The term hypertext was coined in 1965 by Ted Nelson to describe documents, as presented by a Computer, that express the nonlinear structure of ideas, as opposed to the linear format of books, film, and speech. The term hypermedia, more recently introduced, is nearly synonymous but emphasizes the nontextual components of hypertext, such as animation, recorded sound, and video. Hypermedia topics are thus linked in a manner that allows the user to jump from subject to related subject in searching for information. For example, a hypermedia presentation on navigation might include links to such topics as astronomy, bird migration, geography, satellites, and radar. If the information is primarily in text form, the product is hypertext; if video, music, animation, or other elements are included, the product is hypermedia. Microsoft (R) Encarta. Copyright (c) 1993 Microsoft Corporation. Copyright (c) 1993 Funk & Wagnall's Corporation Hypermedia, in computer science, the integration of graphics, sound, video, or any combination into a primarily associative system of information storage and retrieval. Hypermedia, especially in an interactive format where choices are controlled by the user, is structured around the idea of offering a working and learning environment that parallels human thinking˘that is, an environment that allows the user to make associations between topics rather than move sequentially from one to the next, as in an alphabetic list. The term hypertext was coined in 1965 by Ted Nelson to describe documents, as presented by a Computer, that express the nonlinear structure of ideas, as opposed to the linear format of books, film, and speech. The term hypermedia, more recently introduced, is nearly synonymous but emphasizes the nontextual components of hypertext, such as animation, recorded sound, and video. Hypermedia topics are thus linked in a manner that allows the user to jump from subject to related subject in searching for information. For example, a hypermedia presentation on navigation might include links to such topics as astronomy, bird migration, geography, satellites, and radar. If the information is primarily in text form, the product is hypertext; if video, music, animation, or other elements are included, the product is hypermedia. Microsoft (R) Encarta. Copyright (c) 1993 Microsoft Corporation. Copyright (c) 1993 Funk & Wagnall's Corporation tying count of the original complaint. Instead, it said it wants to investigate developments in the industry since the trial concluded and evaluate whether additional conduct-related provisions are necessary, especially in the absence of a breakup. In a statement issued Thursday morning, the Justice Department said it had taken these positions in an effort to "streamline the case with the goal of securing an effective remedy as quickly as possible. Instead of a breakup, the Justice Department said it will ask that Microsoft have certain restrictions placed on its conduct modeled on those the original trial judge imposed on the company in June 2000 but were postponed pending the appeal. In his original order, Judge Jackson imposed a series of restrictions on Microsoft's business practices which were to be effective as the companymoved to split its business in two. Among the conduct remedies Judge Jackson originally imposed were: prohibiting Microsoft from punishing hardware and software companies working on competing products; prohibiting it from favoring computer companies and software developers that helped Microsoft exclude competitors; makers under uniform prices and terms according to a publicly available schedule; and barring Microsoft from interfering with the way PC makers set up startup screens, this Windows desktop preferences, and Internet connection wizards. Since the appeals court first handed down its ruling in the case, Microsoft repeatedly has expressed its Hypermedia, in computer science, the integration of graphics, sound, video, or any combination into a primarily associative system of information storage and retrieval. Hypermedia, especially in an interactive format where choices are controlled by the user, is structured around the idea of offering a working and learning environment that parallels human thinking˘that is, an environment that allows the user to make associations between topics rather than move sequentially from one to the next, as in an alphabetic list. The term hypertext was coined in 1965 by Ted Nelson to describe documents, as presented by a Computer, that express the nonlinear structure of ideas, as opposed to the linear format of books, film, and speech. The term hypermedia, more recently introduced, is nearly synonymous but emphasizes the nontextual components of hypertext, such as animation, recorded sound, and video. Hypermedia topics are thus linked in a manner that allows the user to jump from subject to related subject in searching for information. For example, a hypermedia presentation on navigation might include links to such topics as astronomy, bird migration, geography, satellites, and radar. If the information is primarily in text form, the product is hypertext; if video, music, animation, or other elements are included, the product is hypermedia. Microsoft (R) Encarta. Copyright (c) 1993 Microsoft Corporation. Copyright (c) 1993 Funk & Wagnall's Corporation Hypermedia, in computer science, the integration of graphics, sound, video, or any combination into a primarily associative system of information storage and retrieval. Hypermedia, especially in an interactive format where choices are controlled by the user, is structured around the idea of offering a working and learning environment that parallels human thinking˘that is, an environment that allows the user to make associations between topics rather than move sequentially from one to the next, as in an alphabetic list. The term hypertext was coined in 1965 by Ted Nelson to describe documents, as presented by a Computer, that express the nonlinear structure of ideas, as opposed to the linear format of books, film, and speech. The term hypermedia, more recently introduced, is nearly synonymous but emphasizes the nontextual components of hypertext, such as animation, recorded sound, and video. Hypermedia topics are thus linked in a manner that allows the user to jump from subject to related subject in searching for information. For example, a hypermedia presentation on navigation might include links to such topics as astronomy, bird migration, geography, satellites, and radar. If the information is primarily in text form, the product is hypertext; if video, music, animation, or other elements are included, the product is hypermedia. Microsoft (R) Encarta. Copyright (c) 1993 Microsoft Corporation. Copyright (c) 1993 Funk & Wagnall's Corporation C ATEGORIZATION Hypermedia, in computer science, the integration of graphics, sound, video, or any combination into a primarily associative system of information storage and retrieval. Hypermedia, especially in an interactive format where choices are controlled by the user, is structured around the idea of offering a working and learning environment that parallels human thinking˘that is, an environment that allows the user to make associations between topics rather than move sequentially from one to the next, as in an alphabetic list. The term hypertext was coined in 1965 by Ted Nelson to describe documents, as presented by a Computer, that express the no nlinear structure of ideas, as opposed to the linear format of books, film, and speech. The term hypermedia, more recently introduced, is nearly synonymous but emphasizes the nontextual components of hypertext, such as animation, recorded sound, and video. Hypermedia topics are thus linked in a manner that allows the user to jump from subject to related subject in searching for information. For example, a hypermedia presentation on navigation might include links to such topics as astronomy, bird migration, geography, satellites, and radar. If the information is primarily in text form, the product is hypertext; if video, music, animation, or other elements are included, the product is hypermedia. Microsoft (R) Encarta. Copyright (c) 1993 Microsoft Corporation. Copyright (c) 1993 Funk & Wagnall's Corporation Hypermedia, in computer science, the integration of graphics, sound, video, or any combination into a primarily associative system of information storage and retrieval. Hypermedia, especially in an interactive format where choices are controlled by the user, is structured around the idea of offering a working and learning environment that parallels human thinking˘that is, an environment that allows the user to make associations between topics rather than move sequentially from one to the next, as in an alphabetic list. The term hypertext was coined in 1965 by Ted Nelson to describe documents, as presented by a Computer, that express the nonlinear structure of ideas, as opposed to the linear format of books, film, and speech. The term hypermedia, more recently introduced, is nearly synonymous but emphasizes the nontextual components of hypertext, such as animation, recorded sound, and video. Hypermedia topics are thus linked in a manner that allows the user to jump from subject to related subject in searching for information. For example, a hypermedia presentation on navigation might include links to such topics as astronomy, bird migration, geography, satellites, and radar. If the information is primarily in text form, the product is hypertext; if video, music, animation, or other elements are included, the product is hypermedia. Microsoft (R) Encarta. Copyright (c) 1993 Microsoft Corporation. Copyright (c) 1993 Funk & Wagnall's Corporation Hypermedia, in computer science, the integration of graphics, sound, video, or any combination into a primarily associative system of information storage and retrieval. Hypermedia, especially in an interactive format where choices are controlled by the user, is structured around the idea of offering a working and learning environment that parallels human thinking˘that is, an environment that allows the user to make associations between topics rather than move sequentially from one to the next, as in an alphabetic list. The term hypertext was coined in 1965 by Ted Nelson to describe documents, as presented by a Computer, that express the nonlinear structure of ideas, as opposed to the linear format of books, film, and speech. The term hypermedia, more recently introduced, is nearly synonymous but emphasizes the nontextual components of hypertext, such as animation, recorded sound, and video. Hypermedia topics are thus linked in a manner that allows the user to jump from subject to related subject in searching for information. For example, a hypermedia presentation on navigation might include links to such topics as astronomy, bird migration, geography, satellites, and radar. If the information is primarily in text form, the product is hypertext; if video, music, animation, or other elements are included, the product is hypermedia. Microsoft (R) Encarta. Copyright (c) 1993 Microsoft Corporation. Corporation Hypermedia, in computer science, the integration of graphics, sound, video, or any combination into a primarily associative system of information storage and retrieval. Hypermedia, especially in an interactive format where choices are controlled by the user, is structured around the idea of offering a working and learning environment that parallels human thinking˘that is, an environment that allows the user to make associations between topics rather than move sequentially from one to the next, as in an alphabetic list. The term hypertext was coined in 1965 by Ted Nelson to describe documents, as presented by a Computer, that express the nonlinear structure of ideas, as opposed to the linear format of books, film, and speech. The term hypermedia, more recently introduced, is nearly synonymous but emphasizes the nontextual components of hypertext, such as animation, recorded sound, and video. Hypermedia topics are thus linked in a manner that allows the user to jump from subject to related subject in searching for information. For example, a hypermedia presentation on navigation might include links to such topics as astronomy, bird migration, geography, satellites, and radar. If the information is primarily in text form, the product is hypertext; if video, music, animation, or other elements are included, the product is hypermedia. Microsoft (R) Encarta. Copyright (c) 1993 Microsoft Corporation. Corporation Hypermedia, in computer science, the integration of graphics, sound, video, or any combination into a primarily associative system of information storage and retrieval. Hypermedia, especially in an interactive format where choices are controlled by the user, is structured around the idea of offering a working and learning environment that parallels human thinking˘that is, an environment that allows the user to make associations between topics rather than move sequentially from one to the next, as in an alphabetic list. The term hypertext was coined in 1965 by Ted Nelson to describe documents, as presented by a Computer, that express the nonlinear structure of ideas, as opposed to the linear format of books, film, and speech. The term hypermedia, more recently introduced, is nearly synonymous but emphasizes the nontextual components of hypertext, such as animation, recorded sound, and video. Hypermedia topics are thus linked in a manner that allows the user to jump from subject to related subject in searching for information. For example, a hypermedia presentation on navigation might include links to such topics as astronomy, bird migration, geography, satellites, and radar. If the information is primarily in text form, the product is hypertext; if video, music, animation, or other elements are included, the product is hypermedia. Microsoft (R) Encarta. Copyright (c) 1993 Microsoft Corporation. Copyright (c) 1993 Funk & Wagnall's Corporation Hypermedia, in computer science, the integration of graphics, sound, video, or any combination into a primarily associative system of information storage and retrieval. Hypermedia, especially in an interactive format where choices are controlled by the user, is structured around the idea of offering a working and learning environment that parallels human thinking˘that is, an environment that allows the user to make associations between topics rather than move sequentially from one to the next, as in an alphabetic list. The term hypertext was coined in 1965 by Ted Nelson to describe documents, as presented by a Computer, that express the nonlinear structure of ideas, as opposed to the linear format of books, film, and speech. The term hypermedia, more recently introduced, is nearly synonymous but emphasizes the nontextual components of hypertext, such as animation, recorded sound, and video. Hypermedia topics are thus linked in a manner that allows the user to jump from subject to related subject in searching for information. For example, a hypermedia presentation on navigation might include links to such topics as astronomy, bird migration, geography, satellites, and radar. If the information is primarily in text form, the product is hypertext; if video, music, animation, or other elements are included, the product is hypermedia. Microsoft (R) Encarta. Copyright (c) 1993 Microsoft Corporation. Copyright (c) 1993 Funk & Wagnall's Corporation tying count of the original complaint. Instead, it said it wants to investigate developments in the industry since the trial concluded and evaluate whether additional conduct-related provisions are necessary, especially in the absence of a breakup. In a statement issued Thursday morning, the Justice Department said it had taken these positions in an effort to "streamline the case with the goal of securing an effective remedy as quickly as possible. Instead of a breakup, the Justice Department said it will ask that Microsoft have certain restrictions placed on its conduct modeled on those the original trial judge imposed on the company in June 2000 but were postponed pending the appeal. In his original order, Judge Jackson imposed a series of restrictions on Microsoft's business practices which were to be effective as the companymoved to split its business in two. Among the conduct remedies Judge Jackson originally imposed were: prohibiting Microsoft from punishing hardware and software companies working on competing products; prohibiting it from favoring computer companies and software developers that helped Microsoft exclude competitors; makers under uniform prices and terms according to a publicly available schedule; and barring Microsoft from interfering with the way PC makers set up startup screens, this Windows desktop preferences, and Internet connection wizards. Since the appeals court first handed down its ruling in the case, Microsoft repeatedly has expressed its Hypertext, in computer science, a metaphor for presenting information in which text, images, sounds, and actions become linked together in a complex, nonsequential web of associations that permit the user to browse through related topics, regardless of the presented order of the topics. These links are often established both by the author of a hypertext document and by the user, depending on the intent of the hypertext document. For example, traveling among the links to the word iron in an article might lead the user to the periodic table of the elements or a map of the migration of metallurgy in Iron Age Europe. The term hypertext was coined in 1965 by Ted Nelson to describe documents, as presented by a Computer, that express the nonlinear structure of ideas, as opposed to the linear format of books, film, and speech. The term hypermedia, more recently introduced, is nearly synonymous but emphasizes the nontextual components of hypertext, such as animation, recorded sound, and video. Hypermedia, in computer science, the integration of graphics, sound, video, or any combination into a primarily associative system of information storage and retrieval. Hypermedia, especially in an interactive format where choices are controlled by the user, is structured around the idea of offering a working and learning environment that parallels human thinking˘that is, an environment that allows the user to make associations between topics rather than move sequentially from one to the next, as in an alphabetic list. The term hypertext was coined in 1965 by Ted Nelson to describe documents, as presented by a Computer, that express the nonlinear structure of ideas, as opposed to the linear format of books, film, and speech. The term hypermedia, more recently introduced, is nearly synonymous but emphasizes the nontextual components of hypertext, such as animation, recorded sound, and video. Hypermedia topics are thus linked in a manner that allows the user to jump from subject to related subject in searching for information. For example, a hypermedia presentation on navigation might include links to such topics as astronomy, bird migration, geography, satellites, and radar. If the information is primarily in text form, the product is hypertext; if video, music, animation, or other elements are included, the product is hypermedia. Microsoft (R) Encarta. Copyright (c) 1993 Microsoft Corporation. Copyright (c) 1993 Funk & Wagnall's Corporation Hypermedia, in computer science, the integration of graphics, sound, video, or any combination into a primarily associative system of information storage and retrieval. Hypermedia, especially in an interactive format where choices are controlled by the user, is structured around the idea of offering a working and learning environment that parallels human thinking˘that is, an environment that allows the user to make associations between topics rather than move sequentially from one to the next, as in an alphabetic list. The term hypertext was coined in 1965 by Ted Nelson to describe documents, as presented by a Computer, that express the nonlinear structure of ideas, as opposed to the linear format of books, film, and speech. The term hypermedia, more recently introduced, is nearly synonymous but emphasizes the nontextual components of hypertext, such as animation, recorded sound, and video. Hypermedia topics are thus linked in a manner that allows the user to jump from subject to related subject in searching for information. For example, a hypermedia presentation on navigation might include links to such topics as astronomy, bird migration, geography, satellites, and radar. If the information is primarily in text form, the product is hypertext; if video, music, animation, or other elements are included, the product is hypermedia. Microsoft (R) Encarta. Copyright (c) 1993 Microsoft Corporation. Copyright (c) 1993 Funk & Wagnall's Corporation tying count of the original complaint. Instead, it said it wants to investigate developments in the industry since the trial concluded and evaluate whether additional conduct-related provisions are necessary, especially in the absence of a breakup. In a statement issued Thursday morning, the Justice Department said it had taken these positions in an effort to "streamline the case with the goal of securing an effective remedy as quickly as possible. Instead of a breakup, the Justice Department said it will ask that Microsoft have certain restrictions placed on its conduct modeled on those the original trial judge imposed on the company in June 2000 but were postponed pending the appeal. In his original order, Judge Jackson imposed a series of restrictions on Microsoft's business practices which were to be effective as the companymoved to split its business in two. Among the conduct remedies Judge Jackson originally imposed were: prohibiting Microsoft from punishing hardware and software companies working on competing products; prohibiting it from favoring computer companies and software developers that helped Microsoft exclude competitors; makers under uniform prices and terms according to a publicly available schedule; and barring Microsoft from interfering with the way PC makers set up startup screens, this Windows desktop preferences, and Internet connection wizards. Since the appeals court first handed down its ruling in the case, Microsoft repeatedly has expressed its Hypermedia, in computer science, the integration of graphics, sound, video, or any combination into a primarily associative system of information storage and retrieval. Hypermedia, especially in an interactive format where choices are controlled by the user, is structured around the idea of offering a working and learning environment that parallels human thinking˘that is, an environment that allows the user to make associations between topics rather than move sequentially from one to the next, as in an alphabetic list. The term hypertext was coined in 1965 by Ted Nelson to describe documents, as presented by a Computer, that express the nonlinear structure of ideas, as opposed to the linear format of books, film, and speech. The term hypermedia, more recently introduced, is nearly synonymous but emphasizes the nontextual components of hypertext, such as animation, recorded sound, and video. Hypermedia topics are thus linked in a manner that allows the user to jump from subject to related subject in searching for information. For example, a hypermedia presentation on navigation might include links to such topics as astronomy, bird migration, geography, satellites, and radar. If the information is primarily in text form, the product is hypertext; if video, music, animation, or other elements are included, the product is hypermedia. Microsoft (R) Encarta. Copyright (c) 1993 Microsoft Corporation. Copyright (c) 1993 Funk & Wagnall's Corporation Hypermedia, in computer science, the integration of graphics, sound, video, or any combination into a primarily associative system of information storage and retrieval. Hypermedia, especially in an interactive format where choices are controlled by the user, is structured around the idea of offering a working and learning environment that parallels human thinking˘that is, an environment that allows the user to make associations between topics rather than move sequentially from one to the next, as in an alphabetic list. The term hypertext was coined in 1965 by Ted Nelson to describe documents, as presented by a Computer, that express the nonlinear structure of ideas, as opposed to the linear format of books, film, and speech. The term hypermedia, more recently introduced, is nearly synonymous but emphasizes the nontextual components of hypertext, such as animation, recorded sound, and video. Hypermedia topics are thus linked in a manner that allows the user to jump from subject to related subject in searching for information. For example, a hypermedia presentation on navigation might include links to such topics as astronomy, bird migration, geography, satellites, and radar. If the information is primarily in text form, the product is hypertext; if video, music, animation, or other elements are included, the product is hypermedia. Microsoft (R) Encarta. Copyright (c) 1993 Microsoft Corporation. Copyright (c) 1993 Funk & Wagnall's Corporation Hypermedia, in computer science, the integration of graphics, sound, video, or any combination into a primarily associative system of information storage and retrieval. Hypermedia, especially in an interactive format where choices are controlled by the user, is structured around the idea of offering a working and learning environment that parallels human thinking˘that is, an environment that allows the user to make associations between topics rather than move sequentially from one to the next, as in an alphabetic list. The term hypertext was coined in 1965 by Ted Nelson to describe documents, as presented by a Computer, that express the nonlinear structure of ideas, as opposed to the linear format of books, film, and speech. The term hypermedia, more recently introduced, is nearly synonymous but emphasizes the nontextual components of hypertext, such as animation, recorded sound, and video. Hypermedia topics are thus linked in a manner that allows the user to jump from subject to related subject in searching for information. For example, a hypermedia presentation on navigation might include links to such topics as astronomy, bird migration, geography, satellites, and radar. If the information is primarily in text form, the product is hypertext; if video, music, animation, or other elements are included, the product is hypermedia. Microsoft (R) Encarta. Copyright (c) 1993 Microsoft Corporation. Copyright (c) 1993 Funk & Wagnall's Corporation Hypermedia, in computer science, the integration of graphics, sound, video, or any combination into a primarily associative system of information storage and retrieval. Hypermedia, especially in an interactive format where choices are controlled by the user, is structured around the idea of offering a working and learning environment that parallels human thinking˘that is, an environment that allows the user to make associations between topics rather than move sequentially from one to the next, as in an alphabetic list. The term hypertext was coined in 1965 by Ted Nelson to describe documents, as presented by a Computer, that express the nonlinear structure of ideas, as opposed to the linear format of books, film, and speech. The term hypermedia, more recently introduced, is nearly synonymous but emphasizes the nontextual components of hypertext, such as animation, recorded sound, and video. Hypermedia topics are thus linked in a manner that allows the user to jump from subject to related subject in searching for information. For example, a hypermedia presentation on navigation might include links to such topics as astronomy, bird migration, geography, satellites, and radar. If the information is primarily in text form, the product is hypertext; if video, music, animation, or other elements are included, the product is hypermedia. Microsoft (R) Encarta. Copyright (c) 1993 Microsoft Corporation. Copyright (c) 1993 Funk & Wagnall's Corporation Hypertext, in computer science, a metaphor for presenting information in which text, images, sounds, and actions become linked together in a complex, nonsequential web of associations that permit the user to browse through related topics, regardless of the presented order of the topics. These links are often established both by the author of a hypertext document and by the user, depending on the intent of the hypertext document. For example, traveling among the links to the word iron in an article might lead the user to the periodic table of the elements or a map of the migration of metallurgy in Iron Age Europe. The term hypertext was coined in 1965 by Ted Nelson to describe documents, as presented by a Computer, that express the nonlinear structure of ideas, as opposed to the linear format of books, film, and speech. The term hypermedia, more recently introduced, is nearly synonymous but emphasizes the nontextual components of hypertext, such as animation, recorded sound, and video. Hypermedia, in computer science, the integration of graphics, sound, video, or any combination into a primarily associative system of information storage and retrieval. Hypermedia, especially in an interactive format where choices are controlled by the user, is structured around the idea of offering a working and learning environment that parallels human thinking˘that is, an environment that allows the user to make associations between topics rather than move sequentially from one to the next, as in an alphabetic list. The term hypertext was coined in 1965 by Ted Nelson to describe documents, as presented by a Computer, that express the nonlinear structure of ideas, as opposed to the linear format of books, film, and speech. The term hypermedia, more recently introduced, is nearly synonymous but emphasizes the nontextual components of hypertext, such as animation, recorded sound, and video. Hypermedia topics are thus linked in a manner that allows the user to jump from subject to related subject in searching for information. For example, a hypermedia presentation on navigation might include links to such topics as astronomy, bird migration, geography, satellites, and radar. If the information is primarily in text form, the product is hypertext; if video, music, animation, or other elements are included, the product is hypermedia. Microsoft (R) Encarta. Copyright (c) 1993 Microsoft Corporation. Copyright (c) 1993 Funk & Wagnall's Corporation Hypermedia, in computer science, the integration of graphics, sound, video, or any combination into a primarily associative system of information storage and retrieval. Hypermedia, especially in an interactive format where choices are controlled by the user, is structured around the idea of offering a working and learning environment that parallels human thinking˘that is, an environment that allows the user to make associations between topics rather than move sequentially from one to the next, as in an alphabetic list. The term hypertext was coined in 1965 by Ted Nelson to describe documents, as presented by a Computer, that express the nonlinear structure of ideas, as opposed to the linear format of books, film, and speech. The term hypermedia, more recently introduced, is nearly synonymous but emphasizes the nontextual components of hypertext, such as animation, recorded sound, and video. Hypermedia topics are thus linked in a manner that allows the user to jump from subject to related subject in searching for information. For example, a hypermedia presentation on navigation might include links to such topics as astronomy, bird migration, geography, satellites, and radar. If the information is primarily in text form, the product is hypertext; if video, music, animation, or other elements are included, the product is hypermedia. Microsoft (R) Encarta. Copyright (c) 1993 Microsoft Corporation. Copyright (c) 1993 Funk & Wagnall's Corporation tying count of the original complaint. Instead, it said it wants to investigate developments in the industry since the trial concluded and evaluate whether additional conduct-related provisions are necessary, especially in the absence of a breakup. In a statement issued Thursday morning, the Justice Department said it had taken these positions in an effort to "streamline the case with the goal of securing an effective remedy as quickly as possible. Instead of a breakup, the Justice Department said it will ask that Microsoft have certain restrictions placed on its conduct modeled on those the original trial judge imposed on the company in June 2000 but were postponed pending the appeal. In his original order, Judge Jackson imposed a series of restrictions on Microsoft's business practices which were to be effective as the companymoved to split its business in two. Among the conduct remedies Judge Jackson originally imposed were: prohibiting Microsoft from punishing hardware and software companies working on competing products; prohibiting it from favoring computer companies and software developers that helped Microsoft exclude competitors; makers under uniform prices and terms according to a publicly available schedule; and barring Microsoft from interfering with the way PC makers set up startup screens, this Windows desktop preferences, and Internet connection wizards. Since the appeals court first handed down its ruling in the case, Microsoft repeatedly has expressed its Hypermedia, in computer science, the integration of graphics, sound, video, or any combination into a primarily associative system of information storage and retrieval. Hypermedia, especially in an interactive format where choices are controlled by the user, is structured around the idea of offering a working and learning environment that parallels human thinking˘that is, an environment that allows the user to make associations between topics rather than move sequentially from one to the next, as in an alphabetic list. The term hypertext was coined in 1965 by Ted Nelson to describe documents, as presented by a Computer, that express the nonlinear structure of ideas, as opposed to the linear format of books, film, and speech. The term hypermedia, more recently introduced, is nearly synonymous but emphasizes the nontextual components of hypertext, such as animation, recorded sound, and video. Hypermedia topics are thus linked in a manner that allows the user to jump from subject to related subject in searching for information. For example, a hypermedia presentation on navigation might include links to such topics as astronomy, bird migration, geography, satellites, and radar. If the information is primarily in text form, the product is hypertext; if video, music, animation, or other elements are included, the product is hypermedia. Microsoft (R) Encarta. Copyright (c) 1993 Microsoft Corporation. Copyright (c) 1993 Funk & Wagnall's Corporation Hypermedia, in computer science, the integration of graphics, sound, video, or any combination into a primarily associative system of information storage and retrieval. Hypermedia, especially in an interactive format where choices are controlled by the user, is structured around the idea of offering a working and learning environment that parallels human thinking˘that is, an environment that allows the user to make associations between topics rather than move sequentially from one to the next, as in an alphabetic list. The term hypertext was coined in 1965 by Ted Nelson to describe documents, as presented by a Computer, that express the nonlinear structure of ideas, as opposed to the linear format of books, film, and speech. The term hypermedia, more recently introduced, is nearly synonymous but emphasizes the nontextual components of hypertext, such as animation, recorded sound, and video. Hypermedia topics are thus linked in a manner that allows the user to jump from subject to related subject in searching for information. For example, a hypermedia presentation on navigation might include links to such topics as astronomy, bird migration, geography, satellites, and radar. If the information is primarily in text form, the product is hypertext; if video, music, animation, or other elements are included, the product is hypermedia. Microsoft (R) Encarta. Copyright (c) 1993 Microsoft Corporation. Copyright (c) 1993 Funk & Wagnall's Corporation Hypermedia, in computer science, the integration of graphics, sound, video, or any combination into a primarily associative system of information storage and retrieval. Hypermedia, especially in an interactive format where choices are controlled by the user, is structured around the idea of offering a working and learning environment that parallels human thinking˘that is, an environment that allows the user to make associations between topics rather than move sequentially from one to the next, as in an alphabetic list. The term hypertext was coined in 1965 by Ted Nelson to describe documents, as presented by a Computer, that express the nonlinear structure of ideas, as opposed to the linear format of books, film, and speech. The term hypermedia, more recently introduced, is nearly synonymous but emphasizes the nontextual components of hypertext, such as animation, recorded sound, and video. Hypermedia topics are thus linked in a manner that allows the user to jump from subject to related subject in searching for information. For example, a hypermedia presentation on navigation might include links to such topics as astronomy, bird migration, geography, satellites, and radar. If the information is primarily in text form, the product is hypertext; if video, music, animation, or other elements are included, the product is hypermedia. Microsoft (R) Encarta. Copyright (c) 1993 Microsoft Corporation. Copyright (c) 1993 Funk & Wagnall's Corporation

124 © 2002 Hans Uszkoreit VL CL I NDEXING Hypermedia, in computer science, the integration of graphics, sound, video, or any combination into a primarily associative system of information storage and retrieval. Hypermedia, especially in an interactive format where choices are controlled by the user, is structured around the idea of offering a working and learning environment that parallels human thinking˘that is, an environment that allows the user to make associations between topics rather than move sequentially from one to the next, as in an alphabetic list. The term hypertext was coined in 1965 by Ted Nelson to describe documents, as presented by a Computer, that express the nonlinear structure of ideas, as opposed to the linear format of books, film, and speech. The term hypermedia, more recently introduced, is nearly synonymous but emphasizes the nontextual components of hypertext, such as animation, recorded sound, and video. Hypermedia topics are thus linked in a manner that allows the user to jump from subject to related subject in searching for information. For example, a hypermedia presentation on navigation might include links to such topics as astronomy, bird migration, geography, satellites, and radar. If the information is primarily in text form, the product is hypertext; if video, music, animation, or other elements are included, the product is hypermedia. Microsoft (R) Encarta. Copyright (c) 1993 Microsoft Corporation. Copyright (c) 1993 Funk & Wagnall's Corporation tying count of the original complaint. Instead, it said it wants to investigate developments in the industry since the trial concluded and evaluate whether additional conduct-related provisions are necessary, especially in the absence of a breakup. In a statement issued Thursday morning, the Justice Department said it had taken these positions in an effort to "streamline the case with the goal of securing an effective remedy as quickly as possible. Instead of a breakup, the Justice Department said it will ask that Microsoft have certain restrictions placed on its conduct modeled on those the original trial judge imposed on the company in June 2000 but were postponed pending the appeal. In his original order, Judge Jackson imposed a series of restrictions on Microsoft's business practices which were to be effective as the companymoved to split its business in two. Among the conduct remedies Judge Jackson originally imposed were: prohibiting Microsoft from punishing hardware and software companies working on competing products; prohibiting it from favoring computer companies and software developers that helped Microsoft exclude competitors; makers under uniform prices and terms according to a publicly available schedule; and barring Microsoft from interfering with the way PC makers set up startup screens, this Windows desktop preferences, and Internet connection wizards. Since the appeals court first handed down its ruling in the case, Microsoft repeatedly has expressed its Hypermedia, in computer science, the integration of graphics, sound, video, or any combination into a primarily associative system of information storage and retrieval. Hypermedia, especially in an interactive format where choices are controlled by the user, is structured around the idea of offering a working and learning environment that parallels human thinking˘that is, an environment that allows the user to make associations between topics rather than move sequentially from one to the next, as in an alphabetic list. The term hypertext was coined in 1965 by Ted Nelson to describe documents, as presented by a Computer, that express the nonlinear structure of ideas, as opposed to the linear format of books, film, and speech. The term hypermedia, more recently introduced, is nearly synonymous but emphasizes the nontextual components of hypertext, such as animation, recorded sound, and video. Hypermedia topics are thus linked in a manner that allows the user to jump from subject to related subject in searching for information. For example, a hypermedia presentation on navigation might include links to such topics as astronomy, bird migration, geography, satellites, and radar. If the information is primarily in text form, the product is hypertext; if video, music, animation, or other elements are included, the product is hypermedia. Microsoft (R) Encarta. Copyright (c) 1993 Microsoft Corporation. Copyright (c) 1993 Funk & Wagnall's Corporation Hypertext, in computer science, a metaphor for presenting information in which text, images, sounds, and actions become linked together in a complex, nonsequential web of associations that permit the user to browse through related topics, regardless of the presented order of the topics. These links are often established both by the author of a hypertext document and by the user, depending on the intent of the hypertext document. For example, traveling among the links to the word iron in an article might lead the user to the periodic table of the elements or a map of the migration of metallurgy in Iron Age Europe. The term hypertext was coined in 1965 by Ted Nelson to describe documents, as presented by a Computer, that express the nonlinear structure of ideas, as opposed to the linear format of books, film, and speech. The term hypermedia, more recently introduced, is nearly synonymous but emphasizes the nontextual components of hypertext, such as animation, recorded sound, and video. Dokument DB

125 © 2002 Hans Uszkoreit VL CL S UMMARIZATION Hypermedia, in computer science, the integration of graphics, sound, video, or any combination into a primarily associative system of information storage and retrieval. Hypermedia, especially in an interactive format where choices are controlled by the user, is structured around the idea of offering a working and learning environment that parallels human thinking˘that is, an environment that allows the user to make associations between topics rather than move sequentially from one to the next, as in an alphabetic list. The term hypertext was coined in 1965 by Ted Nelson to describe documents, as presented by a Computer, that express the nonlinear structure of ideas, as opposed to the linear format of books, film, and speech. The term hypermedia, more recently introduced, is nearly synonymous but emphasizes the nontextual components of hypertext, such as animation, recorded sound, and video. Hypermedia topics are thus linked in a manner that allows the user to jump from subject to related subject in searching for information. For example, a hypermedia presentation on navigation might include links to such topics as astronomy, bird migration, geography, satellites, and radar. If the information is primarily in text form, the product is hypertext; if video, music, animation, or other elements are included, the product is hypermedia. Microsoft (R) Encarta. Copyright (c) 1993 Microsoft Corporation. Corporation tying count of the original complaint. Instead, it said it wants to investigate developments in the industry since the trial concluded and evaluate whether additional conduct-related provisions are necessary, especially in the absence of a breakup. In a statement issued Thursday morning, the Justice Department said it had taken these positions in an effort to "streamline the case with the goal of securing an effective remedy as quickly as possible. Instead of a breakup, the Justice Department said it will ask that Microsoft have certain restrictions placed on its conduct modeled on those the original trial judge imposed on the company in June 2000 but were postponed pending the appeal. In his original order, Judge Jackson imposed a series of restrictions on Microsoft's business practices which were to be effective as the companymoved to split its business in two. Among the conduct remedies Judge Jackson originally imposed were: prohibiting Microsoft from punishing hardware and software companies working on competing products; prohibiting it from favoring computer companies and software developers that helped Microsoft exclude competitors; makers under uniform prices and terms according to a publicly available schedule; and barring Microsoft from interfering with the way PC makers set up startup screens, this Windows desktop preferences, and Internet connection wizards. Since the appeals court first handed down its ruling in the case, Microsoft repeatedly has expressed its Hypermedia, in computer science, the integration of graphics, sound, video, or any combination into a primarily associative system of information storage and retrieval. Hypermedia, especially in an interactive format where choices are controlled by the user, is structured around the idea of offering a working and learning environment that parallels human thinking˘that is, an environment that allows the user to make associations between topics rather than move sequentially from one to the next, as in an alphabetic list. The term hypertext was coined in 1965 by Ted Nelson to describe documents, as presented by a Computer, that express the nonlinear structure of ideas, as opposed to the linear format of books, film, and speech. The term hypermedia, more recently introduced, is nearly synonymous but emphasizes the nontextual components of hypertext, such as animation, recorded sound, and video. Hypermedia topics are thus linked in a manner that allows the user to jump from subject to related subject in searching for information. For example, a hypermedia presentation on navigation might include links to such topics as astronomy, bird migration, geography, satellites, and radar. If the information is primarily in text form, the product is hypertext; if video, music, animation, or other elements are included, the product is hypermedia. Microsoft (R) Encarta. Copyright (c) 1993 Microsoft Corporation. Copyright (c) 1993 Funk & Wagnall's Corporation September 6, 2001: 4:39 p.m. ET No Microsoft breakup:... The U.S. Justice Department said Thursday it will not ask that Microsoft be broken in two... The U.S. Court of Appeals for the District of Columbia in late June had overturned a lower court's order it upheld the lower court's conclusion that Microsoft has a monopoly in the market for computer operating systems and maintains that monopoly power

126 © 2002 Hans Uszkoreit VL CL I NFORMATION E XTRACTION proper names: persons, companies, places... special expressions: dates, prices, percentages simple relations: company - location, product - price complex relations: accident affected parties cause time place damage answers to questions: Where is the headquarter of IBM?

127 © 2002 Hans Uszkoreit VL CL I NFORMATION E XTRACTION Bremen, , wiwo: Lagersoftware weiter im Aufwind Die Bremer Firma Trade Consult hat auf einer Pressekonferenz in Hannover die Version 2.0 ihrer erfolgreichen Lagerverwaltungssoftware Store Age vorgestellt... Die neue Version ermöglicht jetzt auch... Auf der Pressekonferenz gab Geschäftsführer Franz Merleback auch die Umsatzzahlen der Softwareschmiede für das 3. Quartal bekannt. Wurden im zweiten Quartal bereits über 30 Millionen Mark umgesetzt, so konnte Merleback jetzt das stolze Ergebnis von 42,5 Millionen verkünden....

128 © 2002 Hans Uszkoreit VL CL I NFORMATION E XTRACTION Bremen, , wiwo: Lagersoftware weiter im Aufwind Die Bremer Firma Trade Consult hat auf einer Pressekonferenz in Hannover die Version 2.0 ihrer erfolgreichen Lagerverwaltungssoftware Store Age vorgestellt... Die neue Version ermöglicht jetzt auch... Auf der Pressekonferenz gab Geschäftsführer Franz Merleback auch die Umsatzzahlen der Softwareschmiede für das 3. Quartal bekannt. Wurden im zweiten Quartal bereits über 30 Millionen Mark umgesetzt, so konnte Merleback jetzt das stolze Ergebnis von 42,5 Millionen verkünden....

129 © 2002 Hans Uszkoreit VL CL IE R ESULT Firma 96Q Q1 97Q2 97Q3 97Q Diff ComSoft 120Mio 110Mio -10 Mio Trade Consult 30 Mio 42,5Mio 12,5 Mio Z&M 71,0Mio


Herunterladen ppt "© 2002 Hans Uszkoreit Vorlesung Einführung in die Computerlinguistik Teil 2 Sprachtechnologie Hans Uszkoreit Universität des Saarlandes und Deutsches Forschungszentrum."

Ähnliche Präsentationen


Google-Anzeigen