Refine
Year of publication
Document Type
- Conference Proceeding (11)
- Article (6)
- Part of a Book (6)
- Book (2)
- Other (1)
- Working Paper (1)
Keywords
- Deutsch (16)
- Korpus <Linguistik> (12)
- Annotation (7)
- Direkte Rede (5)
- Redeerwähnung (5)
- Wortverbindung (5)
- Automatische Sprachanalyse (4)
- Indirekte Rede (4)
- Erzähltechnik (3)
- Grammatik (3)
Publicationstate
- Veröffentlichungsversion (16)
- Zweitveröffentlichung (4)
- Postprint (2)
Reviewstate
- Peer-Review (12)
- (Verlags)-Lektorat (7)
- Verlags-Lektorat (1)
Publisher
- Zenodo (6)
- Institut für Deutsche Sprache (4)
- CEUR-WS (1)
- De Gruyter (1)
- Erich Schmidt Verlag (1)
- European Language Resources Association (1)
- German Society for Computational Linguistics & Language Technology und Friedrich-Alexander-Universität Erlangen-Nürnberg (1)
- Leibniz-Institut für Deutsche Sprache (1)
- Leibniz-Institut für Deutsche Sprache (IDS) (1)
- Pasithee (1)
Automatic recognition of speech, thought, and writing representation in German narrative texts
(2013)
This article presents the main results of a project, which explored ways to recognize and classify a narrative feature—speech, thought, and writing representation (ST&WR)—automatically, using surface information and methods of computational linguistics. The task was to detect and distinguish four types—direct, free indirect, indirect, and reported ST&WR—in a corpus of manually annotated German narrative texts. Rule-based as well as machine-learning methods were tested and compared. The results were best for recognizing direct ST&WR (best F1 score: 0.87), followed by indirect (0.71), reported (0.58), and finally free indirect ST&WR (0.40). The rule-based approach worked best for ST&WR types with clear patterns, like indirect and marked direct ST&WR, and often gave the most accurate results. Machine learning was most successful for types without clear indicators, like free indirect ST&WR, and proved more stable. When looking at the percentage of ST&WR in a text, the results of machine-learning methods always correlated best with the results of manual annotation. Creating a union or intersection of the results of the two approaches did not lead to striking improvements. A stricter definition of ST&WR, which excluded borderline cases, made the task harder and led to worse results for both approaches.
In this paper we outline our corpus-driven approach to detecting, describing and presenting multi- word expressions (MWEs). Our goal is to treat MWEs in a way that gives credit to their flexible nature and their role in language use. The bases of our research are a very large corpus and a Statistical method of collocation analysis. The rich empirical data is interpreted linguistically in a structured way which captures the interrelations, patterns and types of variances of MWEs. Several levels of abstraction build on each other: surface patterns, lexical realizations (LRs), MWEs and MWE patterns. Generalizations are made in a controlled way and in adherence to corpus evidence. The results are published online in a hypertext format.
Die im Folgenden dargestellte korpusgesteuerte Methode "UWV-Analysemodell" wurde auf der Basis der Forschungen zu usuellen Wortverbindungen (UWV) (vgl. Steyer 2000, 2003, 2004, Steyer/Lauer 2007, Brunner/Steyer 2007, Steyer 2008, Steyer demn.) und zahlreicher, exhaustiver Analysen in den letzten Jahren entwickelt. Ziel war ein empirisches Vorgehensmodell, das es ermöglicht, die Differenziertheit und Vernetztheit von Wortverbindungen auf verschiedenen Abstraktionsebenen ausgehend von Kookkurrenzdaten angemessen darzustellen. Daher ging es in dieser Arbeitsphase nicht darum, usuelle Wortverbindungen des Deutschen möglichst umfassend und in großer Menge zu inventarisieren, sondern die "innere Natur" von Wortverbindungen zwischen Varianz und Invarianz mit unterschiedlichen Graden an lexikalischer Spezifiziertheit sowie ihre wechselseitigen Verbindungen im Detail zu erfassen und zu beschreiben.
Bericht von der Dritten Internationalen Konferenz „Grammatik und Korpora“, Mannheim, 22. - 24.9.2009
(2009)
Dieser Artikel fasst wichtige Aspekte der vom Projekt ‘Usuelle Wortverbindungen’ (UWV) erarbeiteten
Konzeption für die korpusbasierte lexikografische Beschreibung von Wortverbindungen in OWID zusammen. Der Schwerpunkt in diesem Teilprojekt liegt auf der lexikografischen Beschreibung des typischen Gebrauchs von usuellen Wortverbindungen auf der Basis eines sehr großen Korpus des Deutschen. Zur differenzierten Untersuchung des Sprachgebrauchs werden korpusanalytische Methoden herangezogen und die Ergebnisse in einem nutzerfreundlichen Hypertextformat präsentiert. Zudem ist es ein Ziel, die sprachliche Vielfalt, die in den Korpora gerade auch in Bezug auf Wortverbindungen zu finden ist, durch eine große Menge authentischer Korpusbelege angemessen darzustellen.
We present a corpus-driven approach to the study of multi-word expressions, which constitute a significant part of. As a data basis, we use collocation profiles computed from DeReKo (Deutsches Referenzkorpus), the largest available collection of written German which has approximately two billion word tokens and is located at the Institute for the German Language (IDS). We employ a strongly usage-based approach to multi-word expressions, which we think of as conventionalised patterns in language use that manifest themselves in recurrent syntagmatic patterns of words. They are defined by their distinct function in language. To find multi-word expressions, we allow ourselves to be guided by corpus data and statistical evidence as much as possible, making interpretative steps carefully and in a monitored fashion. We develop a procedure of interpretation that leads us from the evidence of collocation profiles to a collection of recurrent word patterns and finally to multi-word expressions. When building up a collection of multi-word expressions in this fashion, it becomes clear that the expressions can be defined on different levels of generalisation and are interrelated in various ways. This will be reflected in the documentation and presentation of the findings. We are planning to add annotation in a way that allows grouping the multi-word expressions according to different features and to add links between them to reflect their relationships, thus constructing a network of multi-word expressions.