Refine
Year of publication
Document Type
- Conference Proceeding (8)
- Article (4)
- Part of a Book (3)
- Working Paper (1)
Has Fulltext
- yes (16)
Keywords
- Deutsch (10)
- Korpus <Linguistik> (8)
- Annotation (3)
- Wortverbindung (3)
- Automatische Sprachanalyse (2)
- Erzähltechnik (2)
- Grammatik (2)
- Indirekte Rede (2)
- Komposition <Wortbildung> (2)
- Kongressbericht (2)
Publicationstate
- Veröffentlichungsversion (16) (remove)
Reviewstate
- Peer-Review (9)
- (Verlags)-Lektorat (6)
Publisher
- Zenodo (4)
- CEUR-WS (1)
- Erich Schmidt Verlag (1)
- German Society for Computational Linguistics & Language Technology und Friedrich-Alexander-Universität Erlangen-Nürnberg (1)
- Institut für Deutsche Sprache (1)
- Leibniz-Institut für Deutsche Sprache (IDS) (1)
- Pasithee (1)
- Peter Lang (1)
- Presses Universitaires (1)
- Slavistično društvo: Filozofska fakulteta (1)
Vom 14. bis 16. März fand im Congress Center Rosengarten in Mannheim die 53. Jahrestagung des Instituts für Deutsche Sprache (IDS) statt, die sich in diesem Jahr mit dem Lexikon und dessen Komplexität und Dynamik beschäftigte. Im Mittelpunkt standen neue Perspektiven auf das Lexikon und die Lexikonforschung nach der empirischen Wende, die das Bild vom Wortschatz deutlich verändert und den Blick darauf erweitert hat. Lexikontheoretiker und Lexikografen arbeiten heute u.a. mit quantitativen korpuslinguistischen Methoden und berücksichtigen Forschungsergebnisse und -methoden angrenzender Disziplinen wie der Psycholinguistik, wodurch auch neuartige Konzepte ins Blickfeld rücken. Das Inventar lexikalischer Einheiten beschränkt sich nicht mehr nur auf Wörter, sondern wurde durch konstruktionsartige Einheiten und semiabstrakte lexikalische Muster ergänzt.
This contribution presents the newest version of our ’Wortverbindungsfelder’ (fields of multi-word expressions), an experimental lexicographic resource that focusses on aspects of MWEs that are rarely addressed in traditional descriptions: Contexts, patterns and interrelations. The MWE fields use data from a very large corpus of written German (over 6 billion word forms) and are created in a strictly corpus-based way. In addition to traditional lexicographic descriptions, they include quantitative corpus data which is structured in new ways in order to show the usage specifics. This way of looking at MWEs gives insight in the structure of language and is especially interesting for foreign language learners.
We present recognizers for four very different types of speech, thought and writing representation (STWR) for German texts. The implementation is based on deep learning with two different customized contextual embeddings, namely FLAIR embeddings and BERT embeddings. This paper gives an evaluation of our recognizers with a particular focus on the differences in performance we observed between those two embeddings. FLAIR performed best for direct STWR (F1=0.85), BERT for indirect (F1=0.76) and free indirect (F1=0.59) STWR. For reported STWR, the comparison was inconclusive, but BERT gave the best average results and best individual model (F1=0.60). Our best recognizers, our customized language embeddings and most of our test and training data are freely available and can be found via www.redewiedergabe.de or at github.com/redewiedergabe.
In this paper, we present our work-inprogress to automatically identify free indirect representation (FI), a type of thought representation used in literary texts. With a deep learning approach using contextual string embeddings, we achieve f1 scores between 0.45 and 0.5 (sentence-based evaluation for the FI category) on two very different German corpora, a clear improvement on earlier attempts for this task. We show how consistently marked direct speech can help in this task. In our evaluation, we also consider human inter-annotator scores and thus address measures of certainty for this difficult phenomenon.
In this paper we outline our corpus-driven approach to detecting, describing and presenting multi- word expressions (MWEs). Our goal is to treat MWEs in a way that gives credit to their flexible nature and their role in language use. The bases of our research are a very large corpus and a Statistical method of collocation analysis. The rich empirical data is interpreted linguistically in a structured way which captures the interrelations, patterns and types of variances of MWEs. Several levels of abstraction build on each other: surface patterns, lexical realizations (LRs), MWEs and MWE patterns. Generalizations are made in a controlled way and in adherence to corpus evidence. The results are published online in a hypertext format.