Automatic recognition of speech, thought, and writing representation in German narrative texts
- This article presents the main results of a project, which explored ways to recognize and classify a narrative feature—speech, thought, and writing representation (ST&WR)—automatically, using surface information and methods of computational linguistics. The task was to detect and distinguish four types—direct, free indirect, indirect, and reported ST&WR—in a corpus of manually annotated German narrative texts. Rule-based as well as machine-learning methods were tested and compared. The results were best for recognizing direct ST&WR (best F1 score: 0.87), followed by indirect (0.71), reported (0.58), and finally free indirect ST&WR (0.40). The rule-based approach worked best for ST&WR types with clear patterns, like indirect and marked direct ST&WR, and often gave the most accurate results. Machine learning was most successful for types without clear indicators, like free indirect ST&WR, and proved more stable. When looking at the percentage of ST&WR in a text, the results of machine-learning methods always correlated best with the results of manual annotation. Creating a union or intersection of the results of the two approaches did not lead to striking improvements. A stricter definition of ST&WR, which excluded borderline cases, made the task harder and led to worse results for both approaches.
Author: | Annelen BrunnerORCiDGND |
---|---|
URN: | urn:nbn:de:bsz:mh39-39470 |
ISSN: | 1477-4615 |
Parent Title (English): | Literary and Linguistic Computing |
Document Type: | Article |
Language: | English |
Year of first Publication: | 2013 |
Date of Publication (online): | 2015/07/30 |
Publicationstate: | Postprint |
Reviewstate: | Verlags-Lektorat |
Tag: | Direct speech Automatic recognition of speech; German; Indirect speech; Prose |
GND Keyword: | Automatische Spracherkennung; Deutsch; Direkte Rede; Indirekte Rede; Prosa |
Volume: | 28 |
Issue: | 4 |
First Page: | 563 |
Last Page: | 575 |
Note: | Dieser Beitrag ist mit Zustimmung des Rechteinhabers aufgrund einer (DFG geförderten) Allianz- bzw. Nationallizenz frei zugänglich. This publication is with permission of the rights owner freely accessible due to an Alliance licence and a national licence (funded by the DFG, German Research Foundation) respectively. |
DDC classes: | 400 Sprache / 430 Deutsch |
Open Access?: | ja |
Leibniz-Classification: | Sprache, Linguistik |
Linguistics-Classification: | Computerlinguistik |
Licence (German): | ![]() |