Präsentation herunterladen
Die Präsentation wird geladen. Bitte warten
Veröffentlicht von:Kirsten Richter Geändert vor über 8 Jahren
2
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 A brief introduction to the GeM annotation schema for complex document layout John Bateman Renate Henschel Judy Delin talk given by: Guowen Yang Taipei, September 2002
3
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 Overview of Talk Orientation: –describing the approach to annotation of multimodal documents developed in the GeM project What is the GeM project? –goals, methods, requirements Summary of annotation problems raised Annotation solutions adopted
4
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 The GeM project (‘Genre and Multimodality’) –supported by the British Economic and Social Research Council (ESRC) –Cooperation: University of Stirling University of Bremen Enterprise Information Design Unit –Goal: to put the description of multimodal page-based documents on a sound empirical footing
5
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 The problem of data –there is now much theorizing about how multimodal documents work –but the empirical basis of this theorizing is often less than strong
6
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 Basic GeM hypotheses –Documents belonging to different kinds of ‘genres’ will exhibit different kinds of multimodal patterns just as text sorts exhibit different lexicogrammatical patterns –It should be possible to map out these patterns for different genres –There should be a regular relationship between genre- type and the patterns found
7
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 Requirement –An annotated corpus needed to be constructed containing the extra information that we know/expect to be most useful in establishing descriptions of multimodal documents –The extra information is then to serve as the basis for generalizations about genre
8
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 The problem of data selection and description –what kinds of documents are we talking about? –what kinds of annotation do we need?
9
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 –what kinds of documents are we talking about? Any page-based medium which combines information from a variety of modalities in order to get its message across
10
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 Initial genres selected for the GeM corpus –field guides (birds) –instruction manuals (telephones) –print newspapers –electronic web-based versions of newspapers
11
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 Field guides
12
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 Instruction manuals
13
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 Print newspapers
14
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 Web-based newspapers
15
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 Motivations for selections –all contain combinations of graphical, textual, photographic material –all use the layout of these elements in complex ways –for all the documents taken we were able to obtain feedback and discussion from their designers
16
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 Some Relations to Natural Language Processing –it is our belief that we can approach the design and function of these documents using established linguistic techniques –the ‘unit of analysis’ is scaled-up from the sentence or the text to the page (at least) –given a formal specification of the motivation and realization of such documents, we can consider their automatic generation
17
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 The problem of data selection and description –what kinds of documents are we talking about? –what kinds of annotation do we need?
18
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 –what kinds of annotation do we need?
19
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 The GeM annotation layers Content structure Rhetorical structure Layout structure Navigation structure Linguistic structure
20
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 The GeM annotation layers Content structure Rhetorical structure Layout structure Navigation structure Linguistic structure genre form
21
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 Practical information required –the GeM model also takes seriously the notion that the concrete, practical conditions of production (technology, material, time-available, etc.) all contribute substantially to the properties of a genre Canvas constraints Production constraints Consumption constraints
22
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 Requirements –From the GeM perspective, a page-based multimodal document requires analysis from at least these levels and considering the sources of constraint identified. –Only then do we have enough information to consider: motivation of design critique of design and communicative effectiveness repurposing automatic generation
23
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 Pointers –The assumptions made, and the particular layers of analysis adopted, are motivated and introduced at length in: Delin/Bateman/Allen: Information Design Journal Delin/Bateman: Document Design –Details on the website http://purl.org/net/gem
24
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 Overview of Talk Orientation: –describing the approach to annotation of multimodal documents developed in the GeM project What is the GeM project? –goals, methods, requirements Summary of annotation problems raised Annotation solutions adopted
25
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 Overview of Talk Orientation: –describing the approach to annotation of multimodal documents developed in the GeM project What is the GeM project? –goals, methods, requirements Summary of annotation problems raised Annotation solutions adopted
26
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 Summary of annotation problems raised –form of annotation to select –criteria for recognising units –multiple non-isomorphic intersecting hierarchies –non-linear information –complex query requirements
27
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 Annotation solutions adopted –form of annotation to select TEI: Text Encoding Initiative CES: Corpus Encoding Standard XCES: XML version GEM annotation scheme
28
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 Annotation solutions adopted –criteria for recognising units basic vocabulary of the page: images, signs, sentences, numbers,... layout units: hierarchy determined visually and by considering the degree to which elements ‘belong together’ rhetorical structure: traditional analysis according to Mann&Thompson’s rhetorical structure theory (RST) navigation units: elements pointing elsewhere in the document
29
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 Annotation solutions adopted –multiple non-isomorphic intersecting hierarchies stand-off annotation...
30
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 XML stand-off annotation for encoding the GeM layers a single ‘base’ element annotated file several ‘stand-off’ layers of annotation a Document Type Definition (DTD) for each layer of annotation each annotation layer corresponds to a GeM analysis layer
31
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 GeM layers: the base file --------------- Huge (90cm) unmistakable seabird. Watch for white, cigar-shaped body and long straight, slender, black-tipped wings. In summer, yellow head of adult inconspicuous. Plunges spectacularly for fish. Sexes similar. Basic ‘vocabulary’ of the page, segmented and numbered. Actual ordering and positioning on the page irrelevant at this stage. Predominantly ‘flat’ structure.
32
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 base units Layout Semantic Content RST segments navigational elements layout units Distribution of information across layers
33
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 base units Layout Semantic Content RST segments navigational elements layout units Distribution of information across layers
34
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 Working visually from the page, decompose the objects on the page in terms of their visual unity Transform the page decomposition into a hierarchical structure Specify presentation information for units: e.g., font size, type, colour, image type, resolution, etc. Example: Derivation of layout structure
35
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10
36
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10
37
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10
38
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 provides a place for assigning specific information about the layout units contents given by collections of the base units of the page Complete layout structure
39
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 GeM layers: layout units (1) Layout units content defined by cross references (xrefs) to base units Content here not formally used and may be ommitted <layout-unit id="lay-flegg-text" xref="u-21.7 u-21.8 u-21.9 u-21.10 u-21.11"> Huge (90cm) unmistakable seabird. Watch for white, cigar-shaped body and long straight, slender, black-tipped wings. In summer, yellow head of adult inconspicuous. Plunges spectacularly for fish. Sexes similar.
40
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 GeM layers: layout units (2) Layout units contain typographical details common over the unit and its children Layout units again identified via cross-references Typographical information modelled on CSS and XSL:FO <text xref="lay-21.12 lay-21.14 lay-21.16 lay-21.18 lay-21.20" font-family="sans-serif" font-size="10" font-style="normal" font-weight="bold" case="mixed" justification="right" color="black"/>
41
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 GeM layers: layout units (3) Layout structure is recursive page-21 header-21 body-21page-no-21 lay-21.2 lay-21.3
42
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 Annotation solutions adopted –non-linear information positioning of layout units within a page is specified two-dimensionally with respect to a generalized page model the page model decomposes the page area into a hierarchy of grids specifying the grid for a page is part of the annotation task.
43
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 Working visually from the page, decompose the objects on the page in terms of their visual unity Transform the page decomposition into a hierarchical structure Specify presentation information for units: e.g., font size, type, colour, image type, resolution, etc. Inspect the page for any local or global grid structure Relate layout units to grid positions Example: Derivation of layout structure
44
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 each sub-tree can additionally be assigned to a position in a hierarchically ordered page grid Complete layout structure + page model
45
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 Complete layout structure + page model
46
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 85% 5% 10% 14cm GeM layers: area model Layout units are related to identified elements of a hierarchical grid specified in the area model <area-root id="page-frame" cols="1" rows="3" hspacing="100" vspacing="10 85 5" height="16cm" width="14cm"> <sub-area id="body-frame" location="row-2" cols="2" rows="1" hspacing="50 50" vspacing="100"/> <layout-leaf xref="header-21" location="row-1" area-ref="page-frame"/>... 16cm
47
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 Annotation solutions adopted –complex query requirements Xpath Queries using standard tools
48
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 Conclusions The annotation scheme allows detailed annotation of complex page-based documents Regularities can be sought using complex Xpath queries The system is open-ended and extensible without any redefinition of existing resources
49
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 Ongoing Work Further collection and ongoing annotation of corpus –http://purl.org/net/gem Use of results for criticism of document design and for exploring the relation between layout and rhetorical structure –Delin/Bateman: Document Design, 2002 Use of Xpath queries within sequences of extensible style sheet transformations for automatic document generation –Henschel/Bateman/Delin: Konvens2002, Saarbrücken
50
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 Future Work Extension of annotation schemes –Current violations of the grid area model handled by relative offsets, need more flexible approach –non-rectilinear grids for more complex design –consideration of dynamic elements, animation, etc. Extension of genres considered –advertisements –scientific documents Extension of languages considered
51
Applied Linguistics Sprach- und Literaturwissenschaften Fachbereich 10 Thank you !
Ähnliche Präsentationen
© 2024 SlidePlayer.org Inc.
All rights reserved.