Refine
Year of publication
Document Type
- Conference Proceeding (122)
- Article (109)
- Part of a Book (55)
- Book (11)
- Part of Periodical (3)
- Review (2)
- Other (1)
- Working Paper (1)
Language
- English (218)
- German (83)
- French (2)
- Multiple languages (1)
Keywords
- Korpus <Linguistik> (304) (remove)
Publicationstate
- Veröffentlichungsversion (232)
- Zweitveröffentlichung (53)
- Postprint (17)
- Ahead of Print (1)
Reviewstate
- Peer-Review (304) (remove)
Publisher
- IDS-Verlag (20)
- Institut für Deutsche Sprache (20)
- European Language Resources Association (18)
- de Gruyter (14)
- Association for Computational Linguistics (13)
- Leibniz-Institut für Deutsche Sprache (12)
- Linköping University Electronic Press (11)
- CLARIN (8)
- Erich Schmidt (8)
- Universitäts- und Landesbibliothek Darmstadt (8)
The Czech National Corpus (CNC) is a longterm project striving for extensive and continuous mapping of the Czech language. This effort results mostly in compilation, maintenance and providing free public access to a range of various corpora with the aim to offer a diverse, representative, and high-quality data for empirical research mainly in linguistics. Since 2012, the CNC is officially recognized as a research infrastructure funded by the Czech Ministry of Education, Youth and Sports which has caused a recent shift towards user service-oriented operation of the project. All project-related resources are now integrated into the CNC research portal at http://www.korpus.cz/. Currently, the CNC has an established and growing user community of more than 4,500 active users in the Czech Republic and abroad who put almost 1,900 queries per day using one of the user interfaces. The paper discusses the main CNC objectives for each particular domain, aiming at an overview of the current situation supplemented by an outline of future plans.
Dieser Beitrag stellt das Forschungs- und Lehrkorpus Gesprochenes Deutsch (FOLK) und die Datenbank für Gesprochenes Deutsch (DGD) als Instrumente gesprächsanalytischer Arbeit vor. Nach einer allgemeinen Einführung in FOLK und DGD im zweiten Abschnitt werden im dritten Abschnitt die methodischen Beziehungen zwischen Korpuslinguistik und Gesprächsforschung und die Herausforde-rungen, die sich bei der Begegnung dieser beiden Herangehensweisen an authenti-sches Sprachmaterial stellen, kurz skizziert. Der vierte Abschnitt illustriert dann ausgehend vom Beispiel der Formel ich sag mal, wie eine korpus- und datenbankgesteuerte Analyse zur Untersuchung von Gesprächsphänomenen beitragen kann.
The variation of the strong genitive marker of the singular noun has been treated by diverse accounts. Still there is a consensus that it is to a large extent systematic but can be approached appropriately only if many heterogeneous factors are taken into account. Over thirty variables influencing this variation have been proposed. However, it is actually unclear how effective they can be, and above all, how they interact. In this paper, the potential influencing variables are evaluated statistically in a machine learning approach and modelled in decision trees in order to predict the genitive marking variants. Working with decision trees based exclusively on statistically significant data enables us to determine what combination of factors is decisive in the choice of a marking variant of a given noun. Consequently the variation factors can be assessed with respect to their explanatory power for corpus data and put in a hierarchized order.