Refine
Year of publication
Document Type
- Part of a Book (73)
- Article (14)
- Book (6)
- Conference Proceeding (1)
Has Fulltext
- yes (94)
Keywords
- Deutsch (48)
- Korpus <Linguistik> (11)
- Wörterbuch (11)
- Lexikographie (9)
- Wortbildung (8)
- computerunterstützte Lexikographie (8)
- Verb (7)
- Computerunterstützte Lexikographie (6)
- eLexiko (6)
- Grammatik (5)
Publicationstate
- Veröffentlichungsversion (79)
- Postprint (5)
- Zweitveröffentlichung (1)
Reviewstate
- Verlags-Lektorat (94) (remove)
Publisher
- De Gruyter (17)
- Institut für Deutsche Sprache (14)
- Narr (14)
- de Gruyter (12)
- Winter (4)
- Lang (2)
- Niemeyer (2)
- Sagner (2)
- Amsterdam (1)
- Benjamins (1)
We present SPLICR, the Web-based Sustainability Platform for Linguistic Corpora and Resources. The system is aimed at people who work in Linguistics or Computational Linguistics: a comprehensive database of metadata records can be explored in order to find language resources that could be appropriate for one’s spe cific research needs. SPLICR also provides a graphical interface that enables users to query and to visualise corpora. The project in which the system is developed aims at sustainably archiving the ca. 60 language resources that have been constructed in three collaborative research centres. Our project has two primary goals: (a) To process and to archive sustainably the resources so that they are still available to the research community in five, ten, or even 20 years time. (b) To enable researchers to query the resources both on the level of their metadata as well as on the level of linguistic annotations. In more general terms, our goal is to enable solutions that leverage the interoperability, reusability, and sustainability of heterogeneous collec- tions of language resources.
Automatic recognition of speech, thought, and writing representation in German narrative texts
(2013)
This article presents the main results of a project, which explored ways to recognize and classify a narrative feature—speech, thought, and writing representation (ST&WR)—automatically, using surface information and methods of computational linguistics. The task was to detect and distinguish four types—direct, free indirect, indirect, and reported ST&WR—in a corpus of manually annotated German narrative texts. Rule-based as well as machine-learning methods were tested and compared. The results were best for recognizing direct ST&WR (best F1 score: 0.87), followed by indirect (0.71), reported (0.58), and finally free indirect ST&WR (0.40). The rule-based approach worked best for ST&WR types with clear patterns, like indirect and marked direct ST&WR, and often gave the most accurate results. Machine learning was most successful for types without clear indicators, like free indirect ST&WR, and proved more stable. When looking at the percentage of ST&WR in a text, the results of machine-learning methods always correlated best with the results of manual annotation. Creating a union or intersection of the results of the two approaches did not lead to striking improvements. A stricter definition of ST&WR, which excluded borderline cases, made the task harder and led to worse results for both approaches.