Die Präsentation wird geladen. Bitte warten

Die Präsentation wird geladen. Bitte warten

Computing Kafka Image: DigiTaWG, 3 September 2013 Dr. J. Berenike Herrmann, Dept.

Ähnliche Präsentationen

Präsentation zum Thema: "Computing Kafka Image: DigiTaWG, 3 September 2013 Dr. J. Berenike Herrmann, Dept."—  Präsentation transkript:

1 Computing Kafka Image: http://journalofseeing.files.wordpress.com/2012/04/kafka-detail-001.jpg DigiTaWG, 3 September 2013 Dr. J. Berenike Herrmann, Dept. German Philology, Göttingen University What corpus-stylistic measures can tell us about Franz Kafka's prose

2 Computing Kafka Image: http://journalofseeing.files.wordpress.com/2012/04/kafka-detail-001.jpg DigiTaWG, 3 September 2013 Dr. J. Berenike Herrmann, Dept. German Philology, Göttingen University What corpus-stylistic measures can tell us about Franz Kafka's prose

3 Methodology & Theoretical Background “Toolboxes”

4 “Computing Kafka“: Start of a Project “Corpus Linguists‘ Toolbox“  Digitized texts  Frequency profiling  Concordancing  Collocation analysis -> “lexical bundles“ (Biber, Conrad, & Cortes, 2004), Ngrams, “clusters“ (Mahlberg), etc.  Key word analysis (e.g., Rayson, Scott, Scott & Tribble, Mahlberg, Antony, Stubbs)

5 “Computing Kafka“: Start of a Project “NLP Toolbox“ / “Programmer’s Toolbox”  Sentiment Analysis  Topic Modeling  Stylometric Clustering (literary history/genre)  …MD-analysis [Biber]  by means of Python, R […] -> forthcoming “Psycholinguist’s Toolbox”  test effects of style [features to be determined in textual analysis] on readers, ideally battery of experiments, different participant groups

6 “Computing Kafka“: Start of a Project “Philologist’s Toolbox”  A hundred years’ worth of study of different aspects of Kafka’s prose  Religion and culture (Christian / Jewish): A “jewish author”?  Epoch (Modernism, Prague Modernism, Kafka=solitary phenomenon?)  Genre (Gothic novel, Realistic narration, Fairy tale, Grotesque….)  Culture and science (Psychoanalysis, Modern rationalization/alienation …)  Reader response -> “uncertainty”/ “unsettledness” [no empirical studies]  Historic reception (esp. comparison w/ Robert Walser)  State of publication (while alive / from estate; whole texts / fragments)  Narratological study: “Heterogeneous prose” – different phases, formats  “Formalist” stylistic analysis [no quantitative studies]

7 “Computing Kafka“: Start of a Project “Formalist” stylistic analysis  perspective/focalization (antecessors world literature – Austen, Flaubert, James; German lit. - Stifter, Kleist)  first/third person narrative voice: limited perspective and neutral perspective  “showing, not telling” (gesture, scene)  lexical precision, lexical “scantiness” (Oschmann, 2010)  depiction of external events, situations -> concrete, sensuous phenomena  “progressive abstractness in narration” (Oschmann, 2010)  plot vs. reflection (development: less plot, more reflection)  plot: order of events: relatively arbitrary (Engel, 2010) -> structural homology  deviation from “reality principle” (Engel, 2010) -> ca. one per text  events not motivated -> ca. one per text  types, not figures (flat characters), generalizations, general constellations (Engel, 2010; Oschmann, 2010)  time: iterative, not singulative narration  overall few details, if details are present -> meaningful

8 “Computing Kafka“: Step 1 Corpus linguistics & literature studies Corpus stylistic approach

9 Digitized Texts Kafka in Zeno.org:  ca. 425,000 tokens (counted words)  ca. 26,500 types (distinct word forms) 3 novels (264,669 tokens; 18,344 types; averaged TTR = 6,9%) Amerika (83,805; 9,741) Amerika Der Process (71,773; 7,879) Das Schloß (109,091; 10,623) Das Schloß 58 stories and other types of prose (ca. 160,000 tokens; ca. 8,100 types; averaged TTR ≈ 5,1%) Zwei Gespräche  Gespräch mit dem Beter Gespräch mit dem Beter  Gespräch mit dem Betrunkenen Gespräch mit dem Betrunkenen Betrachtung  Kinder auf der Landstraße Kinder auf der Landstraße  Entlarvung eines Bauernfängers Entlarvung eines Bauernfängers  Der plötzliche Spaziergang Der plötzliche Spaziergang  Entschlüsse Entschlüsse  Der Ausflug ins Gebirge Der Ausflug ins Gebirge  Das Unglück des Junggesellen Das Unglück des Junggesellen  Der Kaufmann Der Kaufmann  Zerstreutes Hinausschaun Zerstreutes Hinausschaun  Der Nachhauseweg Der Nachhauseweg  Die Vorüberlaufenden Die Vorüberlaufenden  Der Fahrgast Der Fahrgast  Kleider Kleider  Die Abweisung Die Abweisung  Zum Nachdenken für Herrenreiter Zum Nachdenken für Herrenreiter  Das Gassenfenster Das Gassenfenster  Wunsch, Indianer zu werden Wunsch, Indianer zu werden  Die Bäume Die Bäume  Unglücklichsein Unglücklichsein

10 Digitized Texts Das Urteil Die Verwandlung In der Strafkolonie Der Kübelreiter Ein Hungerkünstler  Erstes Leid Erstes Leid  Eine kleine Frau Eine kleine Frau  Ein Hungerkünstler Ein Hungerkünstler  Josefine, die Sängerin Josefine, die Sängerin Ein Landarzt  [Widmung] [Widmung]  Der neue Advokat Der neue Advokat  Ein Landarzt Ein Landarzt  Auf der Galerie Auf der Galerie  Ein altes Blatt Ein altes Blatt  Vor dem Gesetz Vor dem Gesetz  Schakale und Araber Schakale und Araber  Ein Besuch im Bergwerk Ein Besuch im Bergwerk  Das nächste Dorf Das nächste Dorf  Eine kaiserliche Botschaft Eine kaiserliche Botschaft  Die Sorge des Hausvaters Die Sorge des Hausvaters  Elf Söhne Elf Söhne  Ein Brudermord Ein Brudermord  Ein Traum Ein Traum  Ein Bericht für eine Akademie Ein Bericht für eine Akademie

11 Digitized Texts Prosa aus dem Nachlaß  Hochzeitsvorbereitungen auf dem Lande Hochzeitsvorbereitungen auf dem Lande  Beim Bau der Chinesischen Mauer Beim Bau der Chinesischen Mauer  Der Jäger Grachhus Der Jäger Grachhus  Die Brücke Die Brücke  Der Schlag ans Hoftor Der Schlag ans Hoftor  Eine Kreuzung Eine Kreuzung  Der Nachbar Der Nachbar  Betrachtungen über Sünde, Leid, Hoffnung und den wahren Weg Betrachtungen über Sünde, Leid, Hoffnung und den wahren Weg  Brief an den Vater Brief an den Vater  Zur Frage der Gesetze Zur Frage der Gesetze  Das Stadtwappen Das Stadtwappen  Poseidon Poseidon  Kleine Fabel Kleine Fabel  Von den Gleichnissen Von den Gleichnissen

12 Key Word Analysis Three stages 1. Compute a word frequency list for each of the two corpora that we wish to compare:  different word forms (types) and occurrence (token) in each text  no. of running words in each corpus

13 Key Word Analysis 2. Compare the two resulting frequency lists  contingency table for each word  apply chosen statistic to calculate keyness value (most widely- used: log-likelihood and chi-squared, cf. Rayson et al., 2009)  the larger the difference in relative frequencies, the larger the value of “keyness” 3. Sort words in terms of keyness

14 Key Word Analysis RankFreqLLWord 12182555.082gregor 268826.190josefine 353824.143hungerkünstler 479701.753reisende 590607.093offizier 645596.268gregors 733527.712samsa 8922525.274er 9692457.332es 10488457.218aber 1141456.119verurteilte 12945454.502sie 13804444.792nicht 14107444.385schwester 15740431.969sich … Kafka ShortProse corpus, key word list obtained with AntConc V3.2.4 (Antony, 2011: http://www.antlab.sci.wase da.ac.jp/antconc_index.ht ml ) http://www.antlab.sci.wase da.ac.jp/antconc_index.ht ml

15 Key Word Analysis: Caveats Caveats (cf. Rayson, 2012): Chi-squared and LL tests assume that samples are random with independent observations -> this is not so! (Evert etc.)  sidestep: place key words in rank order, rather than determine significance for each word A word can be key if it just occurs in one part of the corpus  examination of dispersion is important (-> pruning???) Often too many key words for a researcher to analyze (Berber Sardinha, 1999) Prime importance: Careful choice of reference corpus Three types of keywords are often found (Scott):  proper nouns;  keywords that human beings would recognize as key and that are indicators of the “aboutness” of a particular text;  high-frequency words such as because, shall, or already, which may be indicators of style, rather than aboutness

16 Key Word Analysis: What to look for “Frequent nouns may indicate superficial topics […], but not its underlying themes“ “Verbs are often a better candidate for stylistically relevant words“ (Stubbs, 2005, p. 11) Negatives/negations: pragmatic functions (Hidalgo-Downing, 2000)  imply more than is literally said,  deny expectations,  challenge background propositions;  a way of questioning reality, therefore an alienation device Lexis indicating Involved Production (Biber, 1988) frequent in Engl. romantic fiction (Tribble, 2000)  “personal pronouns, the past forms had and was, the negative particle n't and, last but not least, Nigel [= proper noun]”

17 Key Word Analysis: First Results Amerika, The Trial, and The Castle -> statistically overrepresented:  proper nouns indicating the characters (e.g., K, Karl, Frieda),  common nouns indicating elements of interior spaces (Tür, Zimmer), generic person types (Onkel, Mann), and a multitude of occupational categories (Diener, Oberköchin) -> can be related to Kafka’s typical fictional world and characters A great variety of “small words”  adverbs (nur, vielleicht) -> negotiating certainty and evaluation  modal constructions (hätte, konnte, wollte) and negations (nicht, niemals) -> dealing with aspects of permission, ability, and obligation  pronouns -> related to the narrative perspective (er, ihn), -> also to self-reference (sich, mich) -> impersonal and generalizing constructions (man, alle)

18 Key Word Analysis: More Results See handouts (Key words for two corpora: Novels and ShortProse, each compared to reference corpus zeno.org; including negative key words)

19 Further Analyses Key word analysis not only for examination of the propositional text base, but also for stylistic analysis But: limited to single words (e.g., more complex adverbials are not identified, phraseological units, idioms etc.) One possibility -> Multi-word units -> Ngrams, collocations

20 Collocation Analysis: Lexical bundles Biber et al. (1999) found that idioms and fixed formulas (kick the bucket, a slap in the face) occur rarely in natural speech and writing However, they are more frequent in fiction! kick the bucket, a slap in the face occur ca. 5 times per million words in fiction corpus (cf. Biber et al., 1999, p. 1025pp), much less in conv., fiction, news -> stereotyped dialogue in fiction Types of „idioms“ (Biber et al., 1999, p. 1024pp)  Wh-questions (how do you do? What‘s up?)  Complete noun phrases (a piece of cake)  Prepositional phrases (as a matter of fact, in a nutshell, up to date)  Verb + prep phrases (bear in mind, fall in love)  Verb + noun phrases (kick the bucket, take the bull by the horns)

21 Collocation Analysis: Lexical bundles Question: What kinds of lexical bundles does Kafka‘s prose involve? „Stereotyped dialogue“ at all? See handouts (4Grams for two corpora: Novels and ShortProse)

22 Collocation Analysis: Exploration Religion and culture (Christian / Jewish): A “jewish author”? Preliminary indicators on text surface (cf. Engel, 2010, p. 423), e.g.,  Dom (*dom*; N=19), Kirche (*kirch*; N=50)  Synagoge (N=0), (schwarzer) Bart  *bart* (N=59), *bärt* (N=19) (more poss. indicators should be found in religion dictionaries; compare with Jewish literature; cf. Robertson, 1985)

23 Exploration “Stereotypical Jewish” ( cf. Engel, 2010, p. 423) : Search for collocates of *bart* / *bärt* Problem: Adequate statistics for sparse data: MI, t- score, LL?

24 RankFreqFreqLFreqRStatCollocate 1522012.99618seines 1622011.04557seinen 1722011.20878seinem 1822018.69662rötlichen 2022016.52669langem 2422018.69662buschigem 2820214.44869Balkon 3111014.78973weißen 3211017.69662weißem 3510116.37469verdeckte 3610117.69662unverwandt 371019.91198unter 3810113.83864Tränen 4011018.69662tatarischen 4111018.69662tartarischen 4211018.69662tartarische 4310114.05276strich 4411015.23719starker 4510110.69099stand 4610118.69662speien 4711011.51671solchen 4911016.11166schwarzer 5011014.99618schwarze

25 RankFreqFreqLFreqRStatCollocate 5311016.69662riesenhaften 5511016.37469nassen 5611015.88926namens 5911018.69662krumme 6010118.69662knollennasigen 6111017.69662knochigen 6211010.91526kleinen 671108.12202im 6911018.69662herabhängende 7010117.69662grau 7311018.69662gepflegte 7410118.69662gebräunter 7711017.69662fremdartige 7810118.69662flehte 7911010.04557fast 8010112.91526erschien 8110118.69662erschauert 8210114.69662erlaubt 8611014.99618dünnen 9511017.69662buschigen 9611017.11166blonden

26 Next steps Python… R… http://www.semanticsoftware.info/durm-german- lemmatizer http://www.semanticsoftware.info/durm-german- lemmatizer https://dev2.dariah.eu/wiki/display/TextGrid/Lem matizer https://dev2.dariah.eu/wiki/display/TextGrid/Lem matizer …

27 Computing Kafka Thank you!

28 Literature Adorno, T. W. (1981).“Notes on Kafka.” In Prisms. Cambridge, Massachusetts: MIT Press, 243– 71. Berber Sardinha, T. (1999). Using keywords in text analysis: Practical aspects. DIRECT Working Papers, 42. Biber, D., Conrad, S., & Cortes, V. (2004). If you look at…: Lexical bundles in university teaching and textbooks. Applied Linguistics, 25(3), 371-405. Bondi, M., & Scott. M.(2010). Keyness in texts. [Studies in Corpus Linguistics, 41]. Amsterdam: John Benjamins. Engel, M. (2010). Kafka lesen - Verstehensprobleme und Forschungsparadigmen. In M. Engel & B. Auerochs (Eds.), Kafka-Handbuch. Leben, Werk, Wirkung (pp. 411-427). Stuttgart: Metzler. Mahlberg, M. (2007). Clusters, key clusters and local textual functions in Dickens. Corpora, 2(1), 1-31. doi: doi:10.3366/cor.2007.2.1.1 Oschmann, D. (2010). Kafka als Erzähler. In M. Engel & B. Auerochs (Eds.), Kafka-Handbuch (pp. 438-449). Stuttgart. Rayson, P. (2012). Corpus analysis of key words. In The Encyclopedia of Applied Linguistics. Blackwell. Robertson, R. (1985). Kafka: Judaism, politics, and literature. Oxford: Clarendon Press. Scott, M. & Tribble, C. (2006). Textual patterns. Key word and corpus analysis in language education. Amsterdam: John Benjamins.

Herunterladen ppt "Computing Kafka Image: DigiTaWG, 3 September 2013 Dr. J. Berenike Herrmann, Dept."

Ähnliche Präsentationen