Cross-Language Information Retrieval with a Multilingual Thesaurus and Cross-Concordances: The European Schools Treasury Browser (ETB) Michael Kluck Humboldt-Universität.

Slides:



Advertisements
Ähnliche Präsentationen
Peter Marwedel TU Dortmund, Informatik 12
Advertisements

MPEG-4 3D Inhalterstellung am Beispiel eines 3D-Benutzerhandbuchs für Kfz FV/SLM-Benzler.
Research-guided Teaching Representation in the Biology Curriculum.
Research-guided Teaching Representation in the Biology Curriculum.
Institut für Angewandte Mikroelektronik und Datentechnik Phase 5 Architectural impact on ASIC and FPGA Nils Büscher Selected Topics in VLSI Design (Module.
Kapitel 4 Alles für die Schule Lernziel: Formation of Plural.
Listening Comprehension Kapitel 8 (Lehrbuch KONTAKTE) Thema: Essen und Einkaufen Level: 2. Semester Deutsch at University Level.
SiPass standalone.
Stephanie Müller, Rechtswissenschaftliches Institut, Universität Zürich, Rämistrasse 74/17, 8001 Zürich, Criminal liability.
FTS usage at GridKa Forschungszentrum Karlsruhe GmbH
E STUNDE Deutsch AP. Mittwoch, der 24. April 2013 Deutsch AP (E Stunde)Heute ist ein C Tag Goal: to understand authentic written text, audio material.
E STUNDE Deutsch AP. Freitag, der 19. April 2013 Deutsch AP (E Stunde)Heute ist ein G Tag Goal: to understand authentic written text, audio material and.
E STUNDE Deutsch AP. Donnerstag, der 9. Mai 2013 Deutsch AP (E Stunde)Heute ist ein G Tag Goal: to understand authentic written text, audio material and.
3rd Review, Vienna, 16th of April 1999 SIT-MOON ESPRIT Project Nr Siemens AG Österreich Robotiker Technische Universität Wien Politecnico di Milano.
Physik multimedial Lehr- und Lernmodule für das Studium der Physik als Nebenfach Julika Mimkes: Links to e-learning content for.
E STUNDE Deutsch AP. Montag, der 8. April 2013 Deutsch AP (E Stunde)Heute ist ein E Tag Goal: to understand authentic written text, audio material and.
E STUNDE Deutsch AP. Dienstag, der 16. April 2013 Deutsch AP (E Stunde)Heute ist ein D Tag Goal: to understand authentic written text, audio material.
E STUNDE Deutsch AP. Dienstag, der 23. April 2013 Deutsch AP (E Stunde)Heute ist ein B Tag Goal: to understand authentic written text, audio material.
E STUNDE Deutsch AP. Dienstag, der 28. Mai 2013 Deutsch AP (E Stunde)Heute ist ein E Tag Goal: to understand authentic written text, audio material and.
Holiday destinations, language holidays and informed languages in the EU Lea Kern.
© Crown copyright 2011, Department for Education These materials have been designed to be reproduced for internal circulation, research and teaching or.
1/15 Thursday, 21 June 2007 Werner Sudendorf, Jürgen Keiper Deutsche Kinemathek – Museum für Film und Fernsehen Werner Sudendorf, Jürgen Keiper Reconstructing.
Chair of Business and Human Resource Education II Prof. Dr. Marc Beutner EU-StORE: Evaluation and Standard Implementation Meeting September 8th - 10th.
Gregor Graf Oracle Portal (Part of the Oracle Application Server 9i) Gregor Graf (2001,2002)
© Crown copyright 2011, Department for Education These materials have been designed to be reproduced for internal circulation, research and teaching or.
EUROPÄISCHE GEMEINSCHAFT Europäischer Sozialfonds EUROPÄISCHE GEMEINSCHAFT Europäischer Fonds für Regionale Entwicklung Workpackage 5 – guidelines Tasks.
Fakultät für Gesundheitswissenschaften Gesundheitsökonomie und Gesundheitsmanagement Universität Bielefeld WP 3.1 and WP 4.1: Macrocost.
Kapitel 2 Grammar INDEX 1.Subjects & Verbs 2.Conjugation of Verbs 3.Subject Verb Agreement 4.Person and Number 5.Present Tense 6.Word Order: Position of.
Kapitel 8 Grammar INDEX 1.Command Forms: The Du-Command Form & Ihr- Command 2.Sentences & Clauses.
WILLKOMMEN. By the end of today’s lesson You will know where to find some important information in the text book You will know what is expected of you.
E STUNDE Deutsch AP. Donnerstag, der 30. Mai 2013 Deutsch AP (E Stunde)Heute ist ein G Tag Goal: to understand authentic written text, audio material.
EUROPÄISCHE GEMEINSCHAFT Europäischer Sozialfonds EUROPÄISCHE GEMEINSCHAFT Europäischer Fonds für Regionale Entwicklung Workpackage 5 – guidelines Tasks.
E STUNDE Deutsch AP. Donnerstag, der 11. April 2013 Deutsch AP (E Stunde)Heute ist ein A Tag Goal: to understand authentic written text, audio material.
© Crown copyright 2011, Department for Education These materials have been designed to be reproduced for internal circulation, research and teaching or.
B LOCKED DAY 1 OBJECTIVES: To consolidate vocabulary and structures within the theme of DIE UMWELT To further practise the techniques used in the prose.
Essay structure Example: Die fetten Jahre sind vorbei: Was passiert auf der Almhütte? Welche Bedeutung hat sie für jede der vier Personen? Intro: One or.
LLP DE-COMENIUS-CMP Dieses Projekt wurde mit Unterstützung der Europäischen Kommission finanziert. Die Verantwortung für den Inhalt dieser.
Monitoring System in the federal state of Saxony-Anhalt, Germany Meeting on monitoring systems , May 2012, Prague Christine Makiol,
Staten Island
Sentence Structure Questions
FREE ICONS POWERPOINT TEMPLATE.
Freizeit Thema 5 Kapitel 1 (1)
Scientific Reasoning in Medical Education
Sentence Structure Connectives
Developing Quality in Mathematics Education II
„E!DI“ – EUREKA Danube Initiative
Aspect-Oriented Programming: Fad or the Future
Process and Impact of Re-Inspection in NRW
Synonyms are two or more words belonging to the same part of speech and possessing one or more identical or nearly identical denotational meanings, interchangeable.
Developing Quality in Mathematics Education II
Das Wetter.
Project report Gymnasien Meran
Die andere Vergangenheitsform
Metadata - Catalogues and Digitised works
Cluster Mapping A tool for regional and national policy makers
= 8,7 m € = 121 m € Total Transnationality = 235 m € = 3,1 m €
THE PERFECT TENSE IN GERMAN
The new online recognition process
To English Translations
Health Card for refugees in Bremen
„Förderwolke“ A Cloud-based exchange platform for the qualitative enhancement and improvement of inclusive education Dipl. Reha-Päd. Hanna Linke scientific.
Official Statistics Web Cartography in Germany − Regional Statistics, Federal and European Elections, Future Activities − Joint Working Party meeting.
OFFICE 365 FOCUS SESSION SHAREPOINT ONLINE 101:LERNE DIE BASICS 19. März 2018 Höhr-Grenzhausen.
Integrating Knowledge Discovery into Knowledge Management
School supplies.
Gemeinsame Ausschreibung von Abschlussarbeiten (Bachelor/Master) des markstones Institute of Marketing, Branding & Technology Wer, wie, was, warum? Das.
 Präsentation transkript:

Cross-Language Information Retrieval with a Multilingual Thesaurus and Cross-Concordances: The European Schools Treasury Browser (ETB) Michael Kluck Humboldt-Universität zu Berlin, Abt. Pädagogik und Informatik (HUB-PI) Informationszentrum Sozialwissenschaften, Bonn (IZ) 26th Annual Conference of the Gesellschaft für Klassifikation July 22-24, 2002, University of Mannheim, Germany Slides available at: http://www.educat-hu-berlin.de/~kluck/Etb-GfKl-2002.ppt

Overview the ETB project context the ETB thesaurus the cross-concordances in ETB the ETB search interface conclusion

European Schoolnet (EUN) and European Schools Treasury Browser (ETB) The European Schoolnet (EUN) is a cooperation and concertation activity of more than 20 European countries (Ministries of Education) aiming on the use of ICT in schools and providing tools and services which support information and cooperation mainly for teachers and pupils www.eun.org The European Schools Treasury Browser (ETB) is a project of the EUN and other partners to set up a network of networks of European educational servers to provide a seamless multilingual access to educational resources all over Europe etb.eun.org

Aims of the ETB project Network of European educational servers (national, regional): NNTP-network Not touching the autonomy of single servers Posting of all, selected or none resources (with European added value) Pulling of selected or none resources (related to the own editorial philosophy) integrated multilingual search at the ETB-Server individual proposal of resources via the ETB-Server

Examples of requests for the ETB server Schools searching for cooperation with schools in other countries dealing with similar topics Teachers searching for specific teaching materials: interactive resources on the free fall and gravity Pupils searching for projects of other classes on specific subjects: water pollution Teachers searching for language materials on specific topics

Handling of heterogeneity and multilinguality Developing a multilingual thesaurus Building cross-concordances between different thesauri and classification schemes like EET (European Educational Thesaurus) to ETB DBS-classification (Deutscher Bildungsserver) to ETB using the Thesaurus Management System (SIS-TMS) and mapping files (Excel) Setting up statistical transfer components between free text and thesauri or classification schemes for enhancing the search: (DBS) free text  ETB terms

What about the ETB Thesaurus? Real multilingual thesaurus in currently 9 languages: German, English, French, Italian, Spanish, Danish, Swedish, Greek, Hebrew Further languages will follow (depending on the offer of information sources and of language resources): Slavic languages, regional languages, official EU languages, less spoken languages Area of content (mainly): Online resources of schools, teachers, classes and pupils (learning materials, project results)

Building the ETB thesaurus Starting with empirical analysis: Existing educational thesauri (EET, EUN, ERIC, MOTBIS etc.) Usage of terminology in the Internet User queries of the DBS (log-files) Free keywords assigned to online resources by users of the DBS Building clusters / micro-thesauri Balancing within the micro-thesauri Balancing in between the languages

Advantages of a multilingual thesaurus

The properties of the ETB thesaurus Number of Descriptors ( = preferred terms): ~ 1160 (including countries + regions + languages) translated into all 9 languages Number of Non-Descriptors ( = not used, but indicating / leading to preferred terms), depends on activity of language groups and on available thesauri -> different number for different languages, not translated: German: ~ 280; English: ~ 260; French: ~ 860; Italian ~ 496 The high number of non-descriptors allows a better approximation of the users‘ vocabulary . Hierarchical structure with narrower (NT) and broader (BT) terms, additionally related terms (RT) and scope notes (SN), and association to micro-thesauri (MT), post-coordination.

Descriptors divided into micro-thesauri

Descriptors of the micro-thesaurus Content of Education (MT 70)

Non-Descriptors per language

German example descriptor in alphabetic display extracurriculare Aktivitäten MT (30) (70.70) Da: aktiviteter udenfor læseplan El: εξωσχολικές δραστηριότητες En: extracurricular activities Es: actividades extraescolares Fr: activités hors programme It: attività extracurricolari Sv: fria aktiviteter SN: Von der Schule initiierte und organisierte kulturelle, sozio-kulturelle und Freizeitaktivitäten, bei denen eine Teilnahme weder Pflicht ist, noch in die Bewertung der schulischen Leistungen eines Schülers einfließt. BT schulische Aktivitäten NT Aktivitäten in der freien Natur NT Auslandsaufenthalt NT soziokulturelle Aktivitäten NT Studienreise

Rotated display of descriptors Evaluation Evaluierung USE Evaluation Evolution (Biologie) Examen Experiment experimentelle Pädagogik experimentelle Wissenschaft  extracurriculare Aktivitäten Extremismus Fabel Fabrikationstechnik Fachsprache fachübergreifender Ansatz fachunabhängige Bildung USE intercurriculare Erziehung Fähigkeit zur Nutzung von Informationen Familie Beziehung Schule- Familie Ein-Eltern- Familie Gewalt in der Familie unvollständige Familie USE Ein-Eltern-Familie

German example descriptor in micro-thesaurus 30 SCHULISCHE AKTIVITÄTEN   schulische Aktivitäten top term . . NT1 extracurriculare Aktivitäten« (70.70) narrower terms on . . . . NT2 Aktivitäten in der freien Natur « (70.70) several levels . . . . NT2 Auslandsaufenthalt « (70.70) . . . . NT2 soziokulturelle Aktivitäten « (70.70) also belonging to 70.70 . . . . . . NT3 Ausstellung « (70.70) (polyhierarchy) . . . . . . NT3 Gruppenanimation « (70.70) . . . . NT2 Studienreise « (70.70) . . NT1 Hausaufgabe . . NT1 kreative Tätigkeiten . . . . NT2 Dramatisierung (Theater) term with explanation . . . . NT2 künstlerische Betätigung . . . . . . NT3 Modellieren . . . . . . RT Kunst (70.80) . . . . NT2 Modellbau . . . . RT Kreativität (10) related term

Snapshot of micro-thesaurus 70.70 EXTRACURRICULARE UND INTERDISZIPLINÄRE AKTIVITÄTEN extracurriculare Aktivitäten « (30) . . NT1 Aktivitäten in der freien Natur « (30) . . NT1 Auslandsaufenthalt « (30) . . NT1 soziokulturelle Aktivitäten « (30) . . . . NT2 Ausstellung « (30) . . . . NT2 Gruppenanimation « (30) . . NT1 Studienreise « (30) intercurriculare Erziehung . . NT1 Arbeitslehre . . . . RT Beruf (140) . . NT1 Bewusstseins sensibilisierende Aktivitäten . . . . RT Information (90) . . NT1 Erziehung zur nachhaltigen Entwicklung . . NT1 europäische Dimension . . . . RT europäisches Projekt (110) . . . . RT Europapolitik (140) . . NT1 Friedenserziehung . . . . RT Frieden (110) . . . . RT Solidarität (70.90) . . . . RT Toleranz (70.90)

What about cross-concordances? Setting up relations between existing thesauri and/or classification schemes by intellectual mapping of terms or codes Exact or inexact equivalence (=) Broader term (>) Narrower term (<) Loose relationship (^) One-to-one Logical expressions One-to-many Many-to-one Many-to-many No relationship

Cross-Concordances within ETB Cross-concordances between thesauri: EET terms --> ETB Thesaurus terms EUN terms --> ETB Thesaurus terms MOTBIS terms --> ETB Thesaurus terms Cross-concordances between classification schemes and thesaurus terms Deutscher Bildungsserver class. codes --> ETB Thesaurus terms INFOGUIDE classification codes --> ETB Thesaurus terms Lankskafferiet classification codes --> ETB Thesaurus terms Thesaurus Management System (SIS-TMS) (by FORTH)

Extract of a mapping table: EET thesaurus terms to ETB thesaurus terms EET term relator ETB term

Advantages of using the ETB Thesaurus and the cross-concordances Unique vocabulary which is used all over Europe Translations into other languages are already available Existing thesauri and classification schemes are re-used for mapping and statistical analysis Moderate further development (additions and/or changes of descriptors and mainly non-descriptors, enhancement of the cross-concordances and the statistical relations) Chance to offer all or selected documents via the ETB-network  there, it is possible to carry out multilingual searches