Computing Kafka Image: DigiTaWG, 3 September 2013 Dr. J. Berenike Herrmann, Dept. German Philology, Göttingen University What corpus-stylistic measures can tell us about Franz Kafka's prose
Computing Kafka Image: DigiTaWG, 3 September 2013 Dr. J. Berenike Herrmann, Dept. German Philology, Göttingen University What corpus-stylistic measures can tell us about Franz Kafka's prose
Methodology & Theoretical Background “Toolboxes”
“Computing Kafka“: Start of a Project “Corpus Linguists‘ Toolbox“ Digitized texts Frequency profiling Concordancing Collocation analysis -> “lexical bundles“ (Biber, Conrad, & Cortes, 2004), Ngrams, “clusters“ (Mahlberg), etc. Key word analysis (e.g., Rayson, Scott, Scott & Tribble, Mahlberg, Antony, Stubbs)
“Computing Kafka“: Start of a Project “NLP Toolbox“ / “Programmer’s Toolbox” Sentiment Analysis Topic Modeling Stylometric Clustering (literary history/genre) …MD-analysis [Biber] by means of Python, R […] -> forthcoming “Psycholinguist’s Toolbox” test effects of style [features to be determined in textual analysis] on readers, ideally battery of experiments, different participant groups
“Computing Kafka“: Start of a Project “Philologist’s Toolbox” A hundred years’ worth of study of different aspects of Kafka’s prose Religion and culture (Christian / Jewish): A “jewish author”? Epoch (Modernism, Prague Modernism, Kafka=solitary phenomenon?) Genre (Gothic novel, Realistic narration, Fairy tale, Grotesque….) Culture and science (Psychoanalysis, Modern rationalization/alienation …) Reader response -> “uncertainty”/ “unsettledness” [no empirical studies] Historic reception (esp. comparison w/ Robert Walser) State of publication (while alive / from estate; whole texts / fragments) Narratological study: “Heterogeneous prose” – different phases, formats “Formalist” stylistic analysis [no quantitative studies]
“Computing Kafka“: Start of a Project “Formalist” stylistic analysis perspective/focalization (antecessors world literature – Austen, Flaubert, James; German lit. - Stifter, Kleist) first/third person narrative voice: limited perspective and neutral perspective “showing, not telling” (gesture, scene) lexical precision, lexical “scantiness” (Oschmann, 2010) depiction of external events, situations -> concrete, sensuous phenomena “progressive abstractness in narration” (Oschmann, 2010) plot vs. reflection (development: less plot, more reflection) plot: order of events: relatively arbitrary (Engel, 2010) -> structural homology deviation from “reality principle” (Engel, 2010) -> ca. one per text events not motivated -> ca. one per text types, not figures (flat characters), generalizations, general constellations (Engel, 2010; Oschmann, 2010) time: iterative, not singulative narration overall few details, if details are present -> meaningful
“Computing Kafka“: Step 1 Corpus linguistics & literature studies Corpus stylistic approach
Digitized Texts Kafka in Zeno.org: ca. 425,000 tokens (counted words) ca. 26,500 types (distinct word forms) 3 novels (264,669 tokens; 18,344 types; averaged TTR = 6,9%) Amerika (83,805; 9,741) Amerika Der Process (71,773; 7,879) Das Schloß (109,091; 10,623) Das Schloß 58 stories and other types of prose (ca. 160,000 tokens; ca. 8,100 types; averaged TTR ≈ 5,1%) Zwei Gespräche Gespräch mit dem Beter Gespräch mit dem Beter Gespräch mit dem Betrunkenen Gespräch mit dem Betrunkenen Betrachtung Kinder auf der Landstraße Kinder auf der Landstraße Entlarvung eines Bauernfängers Entlarvung eines Bauernfängers Der plötzliche Spaziergang Der plötzliche Spaziergang Entschlüsse Entschlüsse Der Ausflug ins Gebirge Der Ausflug ins Gebirge Das Unglück des Junggesellen Das Unglück des Junggesellen Der Kaufmann Der Kaufmann Zerstreutes Hinausschaun Zerstreutes Hinausschaun Der Nachhauseweg Der Nachhauseweg Die Vorüberlaufenden Die Vorüberlaufenden Der Fahrgast Der Fahrgast Kleider Kleider Die Abweisung Die Abweisung Zum Nachdenken für Herrenreiter Zum Nachdenken für Herrenreiter Das Gassenfenster Das Gassenfenster Wunsch, Indianer zu werden Wunsch, Indianer zu werden Die Bäume Die Bäume Unglücklichsein Unglücklichsein
Digitized Texts Das Urteil Die Verwandlung In der Strafkolonie Der Kübelreiter Ein Hungerkünstler Erstes Leid Erstes Leid Eine kleine Frau Eine kleine Frau Ein Hungerkünstler Ein Hungerkünstler Josefine, die Sängerin Josefine, die Sängerin Ein Landarzt [Widmung] [Widmung] Der neue Advokat Der neue Advokat Ein Landarzt Ein Landarzt Auf der Galerie Auf der Galerie Ein altes Blatt Ein altes Blatt Vor dem Gesetz Vor dem Gesetz Schakale und Araber Schakale und Araber Ein Besuch im Bergwerk Ein Besuch im Bergwerk Das nächste Dorf Das nächste Dorf Eine kaiserliche Botschaft Eine kaiserliche Botschaft Die Sorge des Hausvaters Die Sorge des Hausvaters Elf Söhne Elf Söhne Ein Brudermord Ein Brudermord Ein Traum Ein Traum Ein Bericht für eine Akademie Ein Bericht für eine Akademie
Digitized Texts Prosa aus dem Nachlaß Hochzeitsvorbereitungen auf dem Lande Hochzeitsvorbereitungen auf dem Lande Beim Bau der Chinesischen Mauer Beim Bau der Chinesischen Mauer Der Jäger Grachhus Der Jäger Grachhus Die Brücke Die Brücke Der Schlag ans Hoftor Der Schlag ans Hoftor Eine Kreuzung Eine Kreuzung Der Nachbar Der Nachbar Betrachtungen über Sünde, Leid, Hoffnung und den wahren Weg Betrachtungen über Sünde, Leid, Hoffnung und den wahren Weg Brief an den Vater Brief an den Vater Zur Frage der Gesetze Zur Frage der Gesetze Das Stadtwappen Das Stadtwappen Poseidon Poseidon Kleine Fabel Kleine Fabel Von den Gleichnissen Von den Gleichnissen
Key Word Analysis Three stages 1. Compute a word frequency list for each of the two corpora that we wish to compare: different word forms (types) and occurrence (token) in each text no. of running words in each corpus
Key Word Analysis 2. Compare the two resulting frequency lists contingency table for each word apply chosen statistic to calculate keyness value (most widely- used: log-likelihood and chi-squared, cf. Rayson et al., 2009) the larger the difference in relative frequencies, the larger the value of “keyness” 3. Sort words in terms of keyness
Key Word Analysis RankFreqLLWord gregor josefine hungerkünstler reisende offizier gregors samsa er es aber verurteilte sie nicht schwester sich … Kafka ShortProse corpus, key word list obtained with AntConc V3.2.4 (Antony, 2011: da.ac.jp/antconc_index.ht ml ) da.ac.jp/antconc_index.ht ml
Key Word Analysis: Caveats Caveats (cf. Rayson, 2012): Chi-squared and LL tests assume that samples are random with independent observations -> this is not so! (Evert etc.) sidestep: place key words in rank order, rather than determine significance for each word A word can be key if it just occurs in one part of the corpus examination of dispersion is important (-> pruning???) Often too many key words for a researcher to analyze (Berber Sardinha, 1999) Prime importance: Careful choice of reference corpus Three types of keywords are often found (Scott): proper nouns; keywords that human beings would recognize as key and that are indicators of the “aboutness” of a particular text; high-frequency words such as because, shall, or already, which may be indicators of style, rather than aboutness
Key Word Analysis: What to look for “Frequent nouns may indicate superficial topics […], but not its underlying themes“ “Verbs are often a better candidate for stylistically relevant words“ (Stubbs, 2005, p. 11) Negatives/negations: pragmatic functions (Hidalgo-Downing, 2000) imply more than is literally said, deny expectations, challenge background propositions; a way of questioning reality, therefore an alienation device Lexis indicating Involved Production (Biber, 1988) frequent in Engl. romantic fiction (Tribble, 2000) “personal pronouns, the past forms had and was, the negative particle n't and, last but not least, Nigel [= proper noun]”
Key Word Analysis: First Results Amerika, The Trial, and The Castle -> statistically overrepresented: proper nouns indicating the characters (e.g., K, Karl, Frieda), common nouns indicating elements of interior spaces (Tür, Zimmer), generic person types (Onkel, Mann), and a multitude of occupational categories (Diener, Oberköchin) -> can be related to Kafka’s typical fictional world and characters A great variety of “small words” adverbs (nur, vielleicht) -> negotiating certainty and evaluation modal constructions (hätte, konnte, wollte) and negations (nicht, niemals) -> dealing with aspects of permission, ability, and obligation pronouns -> related to the narrative perspective (er, ihn), -> also to self-reference (sich, mich) -> impersonal and generalizing constructions (man, alle)
Key Word Analysis: More Results See handouts (Key words for two corpora: Novels and ShortProse, each compared to reference corpus zeno.org; including negative key words)
Further Analyses Key word analysis not only for examination of the propositional text base, but also for stylistic analysis But: limited to single words (e.g., more complex adverbials are not identified, phraseological units, idioms etc.) One possibility -> Multi-word units -> Ngrams, collocations
Collocation Analysis: Lexical bundles Biber et al. (1999) found that idioms and fixed formulas (kick the bucket, a slap in the face) occur rarely in natural speech and writing However, they are more frequent in fiction! kick the bucket, a slap in the face occur ca. 5 times per million words in fiction corpus (cf. Biber et al., 1999, p. 1025pp), much less in conv., fiction, news -> stereotyped dialogue in fiction Types of „idioms“ (Biber et al., 1999, p. 1024pp) Wh-questions (how do you do? What‘s up?) Complete noun phrases (a piece of cake) Prepositional phrases (as a matter of fact, in a nutshell, up to date) Verb + prep phrases (bear in mind, fall in love) Verb + noun phrases (kick the bucket, take the bull by the horns)
Collocation Analysis: Lexical bundles Question: What kinds of lexical bundles does Kafka‘s prose involve? „Stereotyped dialogue“ at all? See handouts (4Grams for two corpora: Novels and ShortProse)
Collocation Analysis: Exploration Religion and culture (Christian / Jewish): A “jewish author”? Preliminary indicators on text surface (cf. Engel, 2010, p. 423), e.g., Dom (*dom*; N=19), Kirche (*kirch*; N=50) Synagoge (N=0), (schwarzer) Bart *bart* (N=59), *bärt* (N=19) (more poss. indicators should be found in religion dictionaries; compare with Jewish literature; cf. Robertson, 1985)
Exploration “Stereotypical Jewish” ( cf. Engel, 2010, p. 423) : Search for collocates of *bart* / *bärt* Problem: Adequate statistics for sparse data: MI, t- score, LL?
RankFreqFreqLFreqRStatCollocate seines seinen seinem rötlichen langem buschigem Balkon weißen weißem verdeckte unverwandt unter Tränen tatarischen tartarischen tartarische strich starker stand speien solchen schwarzer schwarze
RankFreqFreqLFreqRStatCollocate riesenhaften nassen namens krumme knollennasigen knochigen kleinen im herabhängende grau gepflegte gebräunter fremdartige flehte fast erschien erschauert erlaubt dünnen buschigen blonden
Next steps Python… R… lemmatizer lemmatizer matizer matizer …
Computing Kafka Thank you!
Literature Adorno, T. W. (1981).“Notes on Kafka.” In Prisms. Cambridge, Massachusetts: MIT Press, 243– 71. Berber Sardinha, T. (1999). Using keywords in text analysis: Practical aspects. DIRECT Working Papers, 42. Biber, D., Conrad, S., & Cortes, V. (2004). If you look at…: Lexical bundles in university teaching and textbooks. Applied Linguistics, 25(3), Bondi, M., & Scott. M.(2010). Keyness in texts. [Studies in Corpus Linguistics, 41]. Amsterdam: John Benjamins. Engel, M. (2010). Kafka lesen - Verstehensprobleme und Forschungsparadigmen. In M. Engel & B. Auerochs (Eds.), Kafka-Handbuch. Leben, Werk, Wirkung (pp ). Stuttgart: Metzler. Mahlberg, M. (2007). Clusters, key clusters and local textual functions in Dickens. Corpora, 2(1), doi: doi: /cor Oschmann, D. (2010). Kafka als Erzähler. In M. Engel & B. Auerochs (Eds.), Kafka-Handbuch (pp ). Stuttgart. Rayson, P. (2012). Corpus analysis of key words. In The Encyclopedia of Applied Linguistics. Blackwell. Robertson, R. (1985). Kafka: Judaism, politics, and literature. Oxford: Clarendon Press. Scott, M. & Tribble, C. (2006). Textual patterns. Key word and corpus analysis in language education. Amsterdam: John Benjamins.