TY - CHAP U1 - Konferenzveröffentlichung A1 - Jurkiewicz-Rohrbacher, Edyta A1 - Kolaković, Zrinka A1 - Hansen, Björn ED - Bański, Piotr ED - Kupietz, Marc ED - Lüngen, Harald ED - Rayson, Paul ED - Biber, Hanno ED - Breiteneder, Evelyn ED - Clematide, Simon ED - Mariani, John ED - Stevenson, Mark ED - Sick, Theresa T1 - Web corpora - the best possible solution for tracking rare phenomena in underresourced languages: clitics in Bosnian, Croatian and Serbian T2 - Proceedings of the Workshop on Challenges in the Management of Large Corpora and Big Data and Natural Language Processing (CMLC-5+BigNLP) 2017 including the papers from the Web-as-Corpus (WAC-XI) guest section. Birmingham, 24 July 2017 N2 - Complex linguistic phenomena, such as Clitic Climbing in Bosnian, Croatian and Serbian, are often described intuitively, only from the perspective of the main tendency. In this paper, we argue that web corpora currently offer the best source of empirical material for studying Clitic Climbing in BCS. They thus allow the most accurate description of this phenomenon, as less frequent constructions can be tracked only in big, well-annotated data sources. We compare the properties of web corpora for BCS with traditional sources and give examples of studies on CC based on web corpora. Furthermore, we discuss problems related to web corpora and suggest some improvements for the future. KW - Korpus KW - Internet KW - Bosnisch KW - Serbisch KW - Kroatisch KW - clitic climbing KW - Morphem KW - Corpus linguistics KW - Web corpora KW - Bosnian KW - Serbian KW - Croatian Y1 - 2017 U6 - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-62667 UN - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-62667 SP - 49 EP - 55 S1 - 7 PB - Institut für Deutsche Sprache CY - Mannheim ER -