Web corpora - the best possible solution for tracking rare phenomena in underresourced languages: clitics in Bosnian, Croatian and Serbian
- Complex linguistic phenomena, such as Clitic Climbing in Bosnian, Croatian and Serbian, are often described intuitively, only from the perspective of the main tendency. In this paper, we argue that web corpora currently offer the best source of empirical material for studying Clitic Climbing in BCS. They thus allow the most accurate description of this phenomenon, as less frequent constructions can be tracked only in big, well-annotated data sources. We compare the properties of web corpora for BCS with traditional sources and give examples of studies on CC based on web corpora. Furthermore, we discuss problems related to web corpora and suggest some improvements for the future.
Author: | Edyta Jurkiewicz-Rohrbacher, Zrinka Kolaković, Björn Hansen |
---|---|
URN: | urn:nbn:de:bsz:mh39-62667 |
Parent Title (English): | Proceedings of the Workshop on Challenges in the Management of Large Corpora and Big Data and Natural Language Processing (CMLC-5+BigNLP) 2017 including the papers from the Web-as-Corpus (WAC-XI) guest section. Birmingham, 24 July 2017 |
Publisher: | Institut für Deutsche Sprache |
Place of publication: | Mannheim |
Editor: | Piotr Bański, Marc Kupietz, Harald Lüngen, Paul Rayson, Hanno Biber, Evelyn Breiteneder, Simon Clematide, John Mariani, Mark Stevenson, Theresa Sick |
Document Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2017 |
Date of Publication (online): | 2017/07/05 |
Publicationstate: | Veröffentlichungsversion |
Reviewstate: | Peer-Review |
Tag: | Bosnian; Corpus linguistics; Croatian; Serbian; Web corpora; clitic climbing |
GND Keyword: | Bosnisch; Internet; Korpus <Linguistik>; Kroatisch; Morphem; Serbisch |
Page Number: | 7 |
First Page: | 49 |
Last Page: | 55 |
DDC classes: | 400 Sprache |
Open Access?: | ja |
Leibniz-Classification: | Sprache, Linguistik |
Linguistics-Classification: | Grammatikforschung |
Linguistics-Classification: | Korpuslinguistik |
Conferences, Workshops: | CMLC-5 + BigNLP / 5th Workshop on Challenges in the Management of Large Corpora and Big Data and Natural Language Processing |
Licence (German): | Creative Commons - Namensnennung-Nicht kommerziell-Keine Bearbeitung 3.0 Deutschland |