TY - JOUR U1 - Zeitschriftenartikel, wissenschaftlich - begutachtet (reviewed) A1 - Lenders, Winfried A1 - Schmitz, Hans-Christian T1 - Die Elektronische Edition der Schriften Immanuel Kants JF - Kant-Studien N2 - Die Universität Bonn verfügt über ein elektronisches Korpus von Immanuel Kants gesammelten Schriften gemäß den Abteilungen 1–3 der Akademie-Ausgabe. Dieses Korpus bildet die Grundlage einer elektronischen Edition der Schriften Kants, auf die über die Webseite des ehemaligen Instituts für Kommunikationsforschung und Phonetik zugegriffen werden kann: http://www.ikp.uni-bonn.de/kant/. Im vorliegenden Artikel wird über den Umfang und den Zustand des Bonner Korpus und der elektronischen Edition berichtet. KW - Deutsch KW - Immanuel Kant KW - Kant-Korpus KW - elektronische Edition Y1 - 2007 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-8758 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-8758 UR - 10.1515/KANT.2007.011 SN - 1613-1134 SS - 1613-1134 N1 - Dieser Beitrag ist mit Zustimmung des Rechteinhabers aufgrund einer (DFG geförderten) Allianz- bzw. Nationallizenz frei zugänglich. VL - 98 IS - 2 SP - 223 EP - 235 S1 - 13 PB - de Gruyter CY - Berlin ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Fandrych, Christian A1 - Frick, Elena A1 - Hedeland, Hanna A1 - Iliash, Anna A1 - Jettka, Daniel A1 - Meißner, Cordula A1 - Schmidt, Thomas A1 - Wallner, Franziska A1 - Weigert, Kathrin A1 - Westpfahl, Swantje ED - Calzolari, Nicoletta ED - Choukri, Khalid ED - Declerck, Thierry ED - Goggi, Sara ED - Grobelnik, Marko ED - Maegaard, Bente ED - Mariani, Joseph ED - Mazo, Helene ED - Moreno, Asuncion ED - Odijk, Jan ED - Piperidis, Stelios T1 - User, who art thou? User profiling for oral corpus platforms T2 - Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia N2 - This contribution presents the background, design and results of a study of users of three oral corpus platforms in Germany. Roughly 5.000 registered users of the Database for Spoken German (DGD), the GeWiss corpus and the corpora of the Hamburg Centre for Language Corpora (HZSK) were asked to participate in a user survey. This quantitative approach was complemented by qualitative interviews with selected users. We briefly introduce the corpus resources involved in the study in section 2. Section 3 describes the methods employed in the user studies. Section 4 summarizes results of the studies focusing on selected key topics. Section 5 attempts a generalization of these results to larger contexts. KW - oral corpus platform KW - user survey KW - Deutsch KW - Korpus KW - Gesprochene Sprache KW - Benutzerforschung Y1 - 2016 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-50774 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-50774 SN - 978-2-9517408-9-1 SB - 978-2-9517408-9-1 SP - 280 EP - 287 PB - European Language Resources Association (ELRA) CY - Paris ER - TY - CHAP U1 - Buchbeitrag A1 - Steyer, Kathrin ED - Bresson, Daniel T1 - Kollokationen als zentrales Übersetzungsproblem – Vorschläge für eine Kollokationsdatenbank Deutsch-Französisch/Französisch-Deutsch auf der Basis paralleler und vergleichbarer Korpora T2 - Lexikologie und Lexikographie Deutsch-Französisch T3 - Cahiers d'Études Germaniques - 35 KW - Deutsch KW - Kollokation KW - Übersetzung KW - Französisch KW - Korpus Y1 - 1998 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-48894 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-48894 SP - 95 EP - 113 PB - Université Lumière CY - Aix-en-Provence ER - TY - CHAP U1 - Buchbeitrag A1 - Kubczak, Jacqueline A1 - Konopka, Marek ED - Šticha, František ED - Fried, Mirjam T1 - Grammatical Variation in Near-Standard German: a corpus-based project at the Institute for the German Language (IDS) in Mannheim T2 - Grammar & Corpora 2007. Selected contributions from the conference Grammar and Corpora, Sept. 25-27, 2007, Liblice KW - Deutsch KW - Sprachvariante KW - Grammatik KW - Korpus Y1 - 2008 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-48579 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-48579 SN - 978-80-200-1634-8 SB - 978-80-200-1634-8 SP - 251 EP - 260 PB - Academia CY - Prag ER - TY - JOUR U1 - Zeitschriftenartikel, wissenschaftlich - begutachtet (reviewed) A1 - Bubenhofer, Noah A1 - Haupt, Stefanie A1 - Schwinn, Horst T1 - A comparable Wikipedia corpus: from wiki syntax to POS tagged XML JF - [Arbeiten zur Mehrsprachigkeit / B] Arbeiten zur Mehrsprachigkeit = Working papers in multilingualism / Sonderforschungsbereich 538 Mehrsprachigkeit 538, Universität Hamburg N2 - To build a comparable Wikipedia corpus of German, French, Italian, Norwegian, Polish and Hungarian for contrastive grammar research, we used a set of XSLT stylesheets to transform the mediawiki anntations to XML. Furthermore, the data has been amnntated with word class information using different taggers. The outcome is a corpus with rich meta data and linguistic annotation that can be used for multilingual research in various linguistic topics. KW - Korpus KW - Wikipedia KW - Kontrastive Grammatik KW - Comparable Corpus KW - Multilingual Corpus KW - POS-Tagging KW - XSLT Y1 - 2011 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-51897 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-51897 SN - 0176-599X SS - 0176-599X IS - 96 SP - 141 EP - 144 PB - Universität Hamburg CY - Hamburg ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Belica, Cyril A1 - Keibel, Holger A1 - Kupietz, Marc A1 - Perkuhn, Rainer A1 - Vachková, Marie ED - Mahlberg, Michaela ED - González-Díaz, Victorina ED - Smith, Catherine T1 - Putting corpora into perspective. Rethinking synchronicity in corpus linguistics T2 - Proceedings of the Corpus Linguistics Converence 2009 N2 - Empirical synchronic language studies generally seek to investigate language phenomena for one point in time, even though this point in time is often not stated explicitly. Until today, surprisingly little research has addressed the implications of this time-dependency of synchronic research on the composition and analysis of data that are suitable for conducting such studies. Existing solutions and practices tend to be too general to meet the needs of all kinds of research questions. In this theoretical paper that is targeted at both corpus creators and corpus users, we propose to take a decidedly synchronic perspective on the relevant language data. Such a perspective may be realised either in terms of sampling criteria or in terms of analytical methods applied to the data. As a general approach for both realisations, we introduce and explore the FReD strategy (Frequency Relevance Decay) which models the relevance of language events from a synchronic perspective. This general strategy represents a whole family of synchronic perspectives that may be customised to meet the requirements imposed by the specific research questions and language domain under investigation. KW - Korpus KW - Forschungsmethode Y1 - 2010 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-47393 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-47393 UR - http://ucrel.lancs.ac.uk/publications/cl2009/ SP - 22 S1 - 22 PB - University of Liverpool CY - Liverpool ER - TY - CHAP U1 - Buchbeitrag A1 - Lüngen, Harald A1 - Hebborn, Mariana ED - Klawitter, Jana ED - Lobin, Henning ED - Schmidt, Torben T1 - Linguistische Annotationen für die Analyse von Gliederungsstrukturen wissenschaftlicher Texte T2 - Kulturwissenschaften Digital. Neue Forschungsfragen und Methoden KW - Korpus KW - Annotation KW - Ontologie Y1 - 2012 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-47959 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-47959 SN - 978-3-593-41287-0 SB - 978-3-593-41287-0 SP - 155 EP - 176 PB - Campus CY - Frankfurt am Main ER - TY - CHAP U1 - Buchbeitrag A1 - Keibel, Holger A1 - Belica, Cyril A1 - Kupietz, Marc A1 - Perkuhn, Rainer ED - Konopka, Marek ED - Kubczak, Jacqueline ED - Mair, Christian ED - Šticha, František ED - Waßner, Ulrich Hermann T1 - Approaching grammar: Detecting, conceptualizing and generalizing paradigmatic variation T2 - Grammatik und Korpora. Dritte Internationale Konferenz. Mannheim, 22. - 24.9.2009 N2 - This paper presents ongoing research which is embedded in an empirical-linguistic research program, set out to devise viable research strategies for developing an explanatory theory of grammar as a psychological and social phenomenon. As this phenomenon cannot be studied directly, the program attempts to approach it indirectly through its correlates in language corpora, which is justified by referring to the core tenets of Emergent Grammar. The guiding principle for identifying such corpus correlates of grammatical regularities is to imitate the psychological processes underlying the emergent nature of these regularities. While previous work in this program focused on syntagmatic structures, the current paper goes one step further by investigating schematic structures that involve paradigmatic variation. It introduces and explores a general strategy by which corpus correlates of such structures may be uncovered, and it further outlines how these correlates may be used to study the nature of the psychologically real schematic structures. T3 - Korpuslinguistik und interdisziplinäre Perspektiven auf Sprache - 1 KW - Korpus KW - Grammatik KW - Sprachvariante KW - Methode Y1 - 2011 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-47783 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-47783 SN - 978-3-8233-6648-5 SB - 978-3-8233-6648-5 SP - 329 EP - 355 PB - Narr CY - Tübingen ER - TY - CHAP U1 - Buchbeitrag A1 - Lüngen, Harald A1 - Keibel, Holger ED - Steffens, Doris ED - al-Wadi, Doris T1 - Zur Erstellung und Interpretation der Zeitverlaufsgrafiken T2 - Neuer Wortschatz. Neologismen im Deutschen 2001-2010. Band 2: kiten – Z KW - Deutsch KW - Neologismus KW - Korpus KW - Methode Y1 - 2013 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-47870 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-47870 SN - 978-3-937241-43-2 SB - 978-3-937241-43-2 SP - 561 EP - 567 PB - Institut für Deutsche Sprache CY - Mannheim ET - 1. Auflage ER - TY - CHAP U1 - Buchbeitrag A1 - Lüngen, Harald A1 - Keibel, Holger ED - Steffens, Doris ED - al-Wadi, Doris T1 - Zur Erstellung und Interpretation der Zeitverlaufsgrafiken T2 - Neuer Wortschatz. Neologismen im Deutschen 2001-2010. Band 2: kiten – Z KW - Deutsch KW - Neologismus KW - Korpus KW - Methode Y1 - 2014 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-47888 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-47888 SN - 978-3-937241-43-2 SB - 978-3-937241-43-2 SP - 561 EP - 567 PB - Institut für Deutsche Sprache CY - Mannheim ET - 2., durchgesehene Auflage ER - TY - CHAP U1 - Buchbeitrag A1 - Lüngen, Harald A1 - Keibel, Holger ED - Steffens, Doris ED - al-Wadi, Doris T1 - Zur Erstellung und Interpretation der Zeitverlaufsgrafiken T2 - Neuer Wortschatz. Neologismen im Deutschen 2001-2010. Band 2: kiten – Z KW - Deutsch KW - Neologismus KW - Korpus KW - Methode Y1 - 2015 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-47893 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-47893 SN - 978-3-937241-43-2 SB - 978-3-937241-43-2 SP - 561 EP - 567 PB - Institut für Deutsche Sprache CY - Mannheim ET - 3., durchgesehene Auflage ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Sperberg-McQueen, Christopher M. A1 - Schonefeld, Oliver A1 - Kupietz, Marc A1 - Lüngen, Harald A1 - Witt, Andreas T1 - Igel: Comparing document grammars using XQuery T2 - Proceedings of Balisage. The Markup Conference 2013 N2 - Igel is a small XQuery-based web application for examining a collection of document grammars; in particular, for comparing related document grammars to get a better overview of their differences and similarities. In its initial form, Igel reads only DTDs and provides only simple lists of constructs in them (elements, attributes, notations, parameter entities). Our continuing work is aimed at making Igel provide more sophisticated and useful information about document grammars and building the application into a useful tool for the analysis (and the maintenance!) of families of related document grammars T3 - Balisage Series on Markup Technologies - 10 KW - Korpus KW - XML KW - XQuery Y1 - 2013 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-47919 SN - 1947-2609 SS - 1947-2609 U6 - https://dx.doi.org/10.4242/BalisageVol10.Schonefeld01 DO - https://dx.doi.org/10.4242/BalisageVol10.Schonefeld01 SP - ungezählte Seiten S1 - 6 ER - TY - JOUR U1 - Zeitschriftenartikel, wissenschaftlich - begutachtet (reviewed) A1 - Schneider, Roman A1 - Storrer, Angelika A1 - Mehler, Alexander T1 - Editorial JF - Journal for Language Technology and Computational Linguistics KW - Korpus KW - Internet KW - Deutsch Y1 - 2013 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-48107 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-48107 UR - http://www.dwds.de/jlcl/index.php?modus=ausgaben&language= SN - 2190-6858 SS - 2190-6858 VL - 28 IS - 2 SP - III EP - IV PB - GSCL CY - Regensburg ER - TY - CHAP U1 - Buchbeitrag A1 - Hansen, Sandra A1 - Schneider, Roman ED - Iryna, Gurevych ED - Biemann, Chris ED - Zesch, Torsten T1 - Decision Tree-Based Evaluation of Genitive Classification – An Empirical Study on CMC and Text Corpora. Language Processing and Knowledge in the Web T2 - Language Processing and Knowledge in the Web. 25th International Conference, GSCL 2013, Darmstadt, Germany, September 25-27, 2013. Proceedings N2 - Contemporary studies on the characteristics of natural language benefit enormously from the increasing amount of linguistic corpora. Aside from text and speech corpora, corpora of computer-mediated communication (CMC) Position themselves between orality and literacy, and beyond that provide in- sight into the impact of "new", mainly intemet-based media on language beha- viour. In this paper, we present an empirical attempt to work with annotated CMC corpora for the explanation of linguistic phenomena. In concrete terms, we implement machine leaming algorithms to produce decision trees that reveal rules and tendencies about the use of genitive markers in German. T3 - Lecture Notes in Computer Science - 8105 KW - Corpus Linguistics KW - Computer-Mediated Communication KW - Machine Leaming KW - Decision Trees KW - Grammar KW - Genitive Classification Y1 - 2013 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-48115 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-48115 SN - 978-3-642-40721-5 SB - 978-3-642-40721-5 N1 - The final publication is available at Springer via http://www.springer.com/de/book/9783642407215 SP - 83 EP - 88 PB - Springer CY - Berlin/Heidelberg ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Rehbein, Ines A1 - Ruppenhofer, Josef A1 - Sporleder, Caroline A1 - Pinkal, Manfred ED - Jancsary, Jeremy T1 - Adding nominal spice to SALSA – frame-semantic annotation of German nouns and verbs T2 - 11 th Conference on Natural Language Processing (KONVENS). Empirical Methods in Natural Language Processing N2 - This paper presents Release 2.0 of the SALSA corpus, a German resource for lexical semantics. The new corpus release provides new annotations for German nouns, complementing the existing annotations of German verbs in Release 1.0. The corpus now includes around 24,000 sentences with more than 36,000 annotated instances. It was designed with an eye towards NLP applications such as semantic role labeling but will also be a useful resource for linguistic studies in lexical semantics. T3 - Schriftenreihe der Österreichischen Gesellschaft für Artificial Intelligence (ÖGAI) - 5 KW - SALSA KW - Deutsch KW - Korpus KW - Frame-Semantik Y1 - 2012 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-52542 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-52542 UR - https://dblp.uni-trier.de/db/conf/konvens/konvens2012.html SN - 3 - 85027 - 005 - X SB - 3 - 85027 - 005 - X SP - 89 EP - 97 PB - Eigenverlag ÖGAI ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Clematide, Simon A1 - Grindl, Stefan A1 - Klenner, Manfred A1 - Petrakis, Stefanos A1 - Remus, Robert A1 - Ruppenhofer, Josef A1 - Waltinger, Ulli A1 - Wiegand, Michael T1 - MLSA – A Multi-layered Reference Corpus for German Sentiment Analysis T2 - Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12). Istanbul, Turkey N2 - In this paper, we describe MLSA, a publicly available multi-layered reference corpus for German-language sentiment analysis. The construction of the corpus is based on the manual annotation of 270 German-language sentences considering three different layers of granularity. The sentence-layer annotation, as the most coarse-grained annotation, focuses on aspects of objectivity, subjectivity and the overall polarity of the respective sentences. Layer 2 is concerned with polarity on the word- and phrase-level, annotating both subjective and factual language. The annotations on Layer 3 focus on the expression-level, denoting frames of private states such as objective and direct speech events. These three layers and their respective annotations are intended to be fully independent of each other. At the same time, exploring for and discovering interactions that may exist between different layers should also be possible. The reliability of the respective annotations was assessed using the average pairwise agreement and Fleiss’ multi-rater measures. We believe that MLSA is a beneficial resource for sentiment analysis research, algorithms and applications that focus on the German language. KW - MLSA KW - sentiment analysis KW - Deutsch KW - Korpus KW - Sentimentanalyse Y1 - 2012 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-52345 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-52345 UR - http://www.lrec-conf.org/proceedings/lrec2012/pdf/125_Paper.pdf SN - 978-2-9517408-7-7 SB - 978-2-9517408-7-7 SP - 3551 EP - 3556 PB - European Language Resources Association ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Cosma, Ruxandra A1 - Cristea, Dan A1 - Kupietz, Marc A1 - Tufiş, Dan A1 - Witt, Andreas ED - Calzolari, Nicoletta ED - Choukri, Khalid ED - Declerck, Thierry ED - Goggi, Sara ED - Grobelnik, Marko ED - Maegaard, Bente ED - Mariani, Joseph ED - Mazo, Helene ED - Moreno, Asuncion ED - Odijk, Jan ED - Piperidis, Stelios T1 - DRuKoLA – towards contrastive German-Romanian research based on comparable corpora T2 - Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia N2 - This paper introduces the recently started DRuKoLA-project that aims at providing mechanisms to flexibly draw virtual comparable corpora from the German Reference Corpus DeReKo and the Reference Corpus of Contemporary Romanian Language CoRoLa in order to use these virtual corpora as empirical basis for contrastive linguistic research. KW - Deutsch KW - Korpus KW - Rumänisch KW - Textlinguistik KW - Kontrastive Linguistik Y1 - 2016 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-52256 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-52256 SN - 978-2-9517408-9-1 SB - 978-2-9517408-9-1 SP - 28 EP - 32 PB - European Language Resources Association (ELRA) CY - Paris ER - TY - CHAP U1 - Buchbeitrag A1 - Kallmeyer, Werner A1 - Zifonun, Gisela ED - Kallmeyer, Werner ED - Zifonun, Gisela T1 - Vorwort T2 - Sprachkorpora – Datenmengen und Erkenntnisfortschritt T3 - Jahrbuch / Institut für Deutsche Sprache - 2006 KW - Korpus KW - Computerlinguistik Y1 - 2007 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-48737 SN - 978-3-11-019273-5 SB - 978-3-11-019273-5 U6 - https://dx.doi.org/10.1515/9783110439083-001 DO - https://dx.doi.org/10.1515/9783110439083-001 SP - VII EP - X PB - de Gruyter CY - Berlin/New York ER - TY - BOOK U1 - Buch A1 - Storjohann, Petra ED - Blühdorn, Hardarik ED - Elstermann, Mechthild ED - Klosa, Annette T1 - Deutsche Antonyme aus korpuslinguistischer Sicht – Muster und Funktionen T3 - OPAL - Online publizierte Arbeiten zur Linguistik - 3/2015 KW - Deutsch KW - Antonym KW - Semasiologie KW - Korpus Y1 - 2015 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-50644 SN - 1860-9422 SS - 1860-9422 U6 - https://dx.doi.org/doi:10.14618/opal_03-2015 DO - https://dx.doi.org/doi:10.14618/opal_03-2015 SP - 37 S1 - 37 PB - Institut für Deutsche Sprache CY - Mannheim ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Storjohann, Petra T1 - Corpus-driven vs. corpus-based approach to the study of relational patterns T2 - Proceedings of the Corpus Linguistics Conference 2005, Birmingham N2 - Contextual lexical relations, such as sense relations, have traditionally played an essential role in disambiguating word senses in lexicography, as they offer insights into the meaning and use of a word. However, the description of paradigmatic relations in particular is often restricted to a few types such as synonymy and antonymy. The limited description of various types of relations and the method of presenting these relations in existing German dictionaries are often problematic. Elexiko, the first German hypertext dictionary compiled exclusively on the basis of an electronic corpus, offers a new way of presenting sense relations, using a variety of approaches to extract the necessary data. In this paper, I will show how elexiko presents a differentiated system of paradigmatic relations including synonymy, various subtypes of incompatibility (such as antonymy, complementarity, converseness, reversiveness, etc.), and vertical structures (such as hyponymy and meronymy). Primary attention, however, will focus on the question of how data for a paradigmatic description is retrieved from the corpus. Whereas a corpus-driven approach is mainly used for various semantic information and a corpus-based method plays an important part in obtaining data for the grammatical description in elexiko, it will be argued that both the corpus-driven and the corpus-based approach can be complementary methods in gaining insights into sense relations. I will demonstrate which results can be obtained by each approach, and advantages and disadvantages of both procedures will be explored in more detail. As sense relations are context-dependent, it will also be demonstrated how a sense-bound presentation can be realised in an electronic reference work including a system of cross-referencing that illustrates lexical structures and the interrelatedness of words within the lexicon. Finally, I will show how accompanying examples from the corpus and additional lexicographic information help the user to understand contextual restrictions, so that s/he is able to use dictionary information more effectively. KW - Computerunterstützte Lexikographie KW - Online-Wörterbuch KW - Semasiologie KW - Wortfeld KW - Antonym KW - Synonym Y1 - 2005 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-50063 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-50063 UR - http://www.birmingham.ac.uk/research/activity/corpus/publications/conference-archives/2005-conf-e-journal.aspx SP - 20 S1 - 20 PB - University of Birmingham CY - Birmingham ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Westpfahl, Swantje A1 - Schmidt, Thomas ED - Calzolari, Nicoletta ED - Choukri, Khalid ED - Declerck, Thierry ED - Goggi, Sara ED - Grobelnik, Marko ED - Maegaard, Bente ED - Mariani, Joseph ED - Mazo, Helene ED - Moreno, Asuncion ED - Odijk, Jan ED - Piperidis, Stelios T1 - FOLK-Gold ― A gold standard for part-of-speech-tagging of spoken German T2 - Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia N2 - In this paper, we present a GOLD standard of part-of-speech tagged transcripts of spoken German. The GOLD standard data consists of four annotation layers – transcription (modified orthography), normalization (standard orthography), lemmatization and POS tags – all of which have undergone careful manual quality control. It comes with guidelines for the manual POS annotation of transcripts of German spoken data and an extended version of the STTS (Stuttgart Tübingen Tagset) which accounts for phenomena typically found in spontaneous spoken German. The GOLD standard was developed on the basis of the Research and Teaching Corpus of Spoken German, FOLK, and is, to our knowledge, the first such dataset based on a wide variety of spontaneous and authentic interaction types. It can be used as a basis for further development of language technology and corpus linguistic applications for German spoken language. KW - German spoken language KW - GOLD standard KW - Deutsch KW - Gesprochene Sprache KW - Korpus KW - Part-of-Speech-Tagging = POS Y1 - 2016 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-50786 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-50786 SN - 978-2-9517408-9-1 SB - 978-2-9517408-9-1 SP - 1493 EP - 1499 PB - European Language Resources Association (ELRA) CY - Paris ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Storjohann, Petra ED - Mahlberg, Michaela ED - González-Díaz, Victorina ED - Smith, Catherine T1 - Colligational patterns in a corpus and their lexicographic documentation T2 - Proceedings of the Corpus Linguistics Conference 2009, Liverpool N2 - This paper shows how corpora and related tools can be used to analyse and present significant colligational patterns lexicographically. In German, patterns such as das nötige Wissen vermitteln and sein Wissen unter Beweis stellen play a vital role when learning the language, as they exhibit relevant idiomatic usage and lexical and syntactic rules of combination. Each item has specific semantic and grammatical functions and particular preferences with respect to position and distribution. An analysis of adjectives, for example, identifies preferences in adverbial, attributive, or predicative functions. Traditionally, corpus analyses of syntagmatic constructions have not been conducted for lexicographic purposes. This paper shows how to utilise corpora to extract and examine typical syntagms and how the results of such an analysis are documented systematically in ELEXIKO, a large-scale corpus-based Internet reference work of German. It also demonstrates how this dictionary accounts for the lexical and grammatical interplay between units in a syntagm and how authentic corpus material and complementary prose-style usage notes are a useful guide to text production or reception. KW - Korpus KW - Computerunterstützte Lexikographie KW - Syntagma KW - Deutsch KW - eLexiko Y1 - 2010 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-49976 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-49976 UR - http://ucrel.lancs.ac.uk/publications/cl2009/ SP - 19 S1 - 19 PB - University of Liverpool CY - Liverpool ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Diewald, Nils A1 - Hanl, Michael A1 - Margaretha, Eliza A1 - Bingel, Joachim A1 - Kupietz, Marc A1 - Bański, Piotr A1 - Witt, Andreas ED - Calzolari, Nicoletta ED - Choukri, Khalid ED - Declerck, Thierry ED - Goggi, Sara ED - Grobelnik, Marko ED - Maegaard, Bente ED - Mariani, Joseph ED - Mazo, Helene ED - Moreno, Asuncion ED - Odijk, Jan ED - Piperidis, Stelios T1 - KorAP architecture – diving in the deep sea of corpus data T2 - Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia N2 - KorAP is a corpus search and analysis platform, developed at the Institute for the German Language (IDS). It supports very large corpora with multiple annotation layers, multiple query languages, and complex licensing scenarios. KorAP’s design aims to be scalable, flexible, and sustainable to serve the German Reference Corpus DEREKO for at least the next decade. To meet these requirements, we have adopted a highly modular microservice-based architecture. This paper outlines our approach: An architecture consisting of small components that are easy to extend, replace, and maintain. The components include a search backend, a user and corpus license management system, and a web-based user frontend. We also describe a general corpus query protocol used by all microservices for internal communications. KorAP is open source, licensed under BSD-2, and available on GitHub. KW - Korpusanalyseplattform (KorAP) KW - Institut für Deutsche Sprache KW - Textlinguistik KW - Korpus KW - microservices KW - large corpus data Y1 - 2016 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-50361 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-50361 SN - 978-2-9517408-9-1 SB - 978-2-9517408-9-1 SP - 3586 EP - 3591 PB - European Language Resources Association (ELRA) CY - Paris ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Bański, Piotr A1 - Frick, Elena A1 - Witt, Andreas ED - Calzolari, Nicoletta ED - Choukri, Khalid ED - Declerck, Thierry ED - Goggi, Sara ED - Grobelnik, Marko ED - Maegaard, Bente ED - Mariani, Joseph ED - Mazo, Helene ED - Moreno, Asuncion ED - Odijk, Jan ED - Piperidis, Stelios T1 - Corpus Query Lingua Franca (CQLF) T2 - Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia N2 - The present paper describes Corpus Query Lingua Franca (ISO CQLF), a specification designed at ISO Technical Committee 37 Subcommittee 4 “Language resource management” for the purpose of facilitating the comparison of properties of corpus query languages. We overview the motivation for this endeavour and present its aims and its general architecture. CQLF is intended as a multi-part specification; here, we concentrate on the basic metamodel that provides a frame that the other parts fit in. KW - Korpus KW - Abfragesprache Y1 - 2016 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-50405 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-50405 SN - 978-2-9517408-9-1 SB - 978-2-9517408-9-1 SP - 2804 EP - 2809 PB - European Language Resources Association (ELRA) CY - Paris ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Prévot, Laurent A1 - Gorisch, Jan A1 - Bertrand, Roxane ED - Calzolari, Nicoletta ED - Choukri, Khalid ED - Declerck, Thierry ED - Goggi, Sara ED - Grobelnik, Marko ED - Maegaard, Bente ED - Mariani, Joseph ED - Mazo, Helene ED - Moreno, Asuncion ED - Odijk, Jan ED - Piperidis, Stelios T1 - A CUP of CoFee: A Large Collection of Feedback Utterances Provided with Communicative Function Annotations T2 - Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia N2 - There have been several attempts to annotate communicative functions to utterances of verbal feedback in English previously. Here, we suggest an annotation scheme for verbal and non-verbal feedback utterances in French including the categories base, attitude, previous and visual. The data comprises conversations, maptasks and negotiations from which we extracted ca. 13,000 candidate feedback utterances and gestures. 12 students were recruited for the annotation campaign of ca. 9,500 instances. Each instance was annotated by between 2 and 7 raters. The evaluation of the annotation agreement resulted in an average best-pair kappa of 0.6. While the base category with the values acknowledgement, evaluation, answer, elicit and other achieves good agreement, this is not the case for the other main categories. The data sets, which also include automatic extractions of lexical, positional and acoustic features, are freely available and will further be used for machine learning classification experiments to analyse the form-function relationship of feedback. KW - Conversational Feedback KW - Communicative Functions KW - Annotator Agreement KW - Pragmatik KW - Gesprochene Sprache KW - Rückmeldung KW - Automatische Sprachanalyse KW - Annotation Y1 - 2016 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-50414 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-50414 SN - 978-2-9517408-9-1 SB - 978-2-9517408-9-1 SP - 3180 EP - 3185 PB - European Language Resources Association (ELRA) CY - Paris ER - TY - GEN U1 - Sonstiges A1 - Perkuhn, Rainer A1 - Belica, Cyril A1 - Kupietz, Marc A1 - Keibel, Holger A1 - Hennig, Sophie T1 - DeReWo: Korpusbasierte Wortformenliste. Technical Report IDS-KL-2009-02 KW - Korpus KW - Deutsch KW - Deutsches Referenzkorpus (DeReKo) KW - Institut für Deutsche Sprache Y1 - 2009 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-50313 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-50313 UR - http://www1.ids-mannheim.de/fileadmin/kl/derewo/derewo-v-100000t-2009-04-30-0.1.zip PB - Institut für Deutsche Sprache CY - Mannheim ER - TY - CHAP U1 - Buchbeitrag A1 - Hansen-Morath, Sandra A1 - Wolfer, Sascha ED - Konopka, Marek ED - Wöllstein, Angelika T1 - Standardisierte statistische Auswertungen von Korpusdaten im Projekt "Korpusgrammatik" (KoGra-R) T2 - Grammatische Variation. Empirische Zugänge und theoretische Modellierung N2 - Wir zeigen anhand dreier Beispielanalysen, wie das im IDS-Projekt „Korpusgrammatik“ entwickelte Auswertungstool KoGra-R in der quantitativlinguistischen Forschung zur Analyse von Frequenzdaten auf mehreren linguistischen Ebenen eingesetzt werden kann. Wir demonstrieren dies anhand regionaler Präferenzen bei der Selektion von Genitivallomorphen, der Variation von Relativpronomina sowie der Verwendung bestimmter anaphorischer Ausdrucke in Abhängigkeit davon, ob sich das Antezedens im gleichen Satz befindet oder nicht. Die in KoGra-R implementierten statistischen Tests sind für jede dieser Ebenen geeignet, um mindestens einen ersten statistisch abgesicherten Eindruck der Datenlage zu erlangen. T3 - Jahrbuch / Institut für Deutsche Sprache - 2016 KW - Korpus KW - Grammatik KW - Automatische Sprachanalyse KW - KoGra-R Y1 - 2017 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-59699 SN - 978-3-11-050115-5 SB - 978-3-11-050115-5 U6 - https://dx.doi.org/10.1515/9783110518214-021 DO - https://dx.doi.org/10.1515/9783110518214-021 SP - 345 EP - 356 PB - De Gruyter CY - Berlin [u.a.] ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Beißwenger, Michael A1 - Chanier, Thierry A1 - Chiari, Isabella A1 - Erjavec, Tomaž A1 - Fišer, Darja A1 - Herold, Axel A1 - Ljubešić, Nikola A1 - Lüngen, Harald A1 - Poudat, Céline A1 - Stemle, Egon W. A1 - Storrer, Angelika A1 - Wigham, Ciara ED - Borin, Lars T1 - Integrating corpora of computer-mediated communication into the language resources landscape: Initiatives and best practices from French, German, Italian and Slovenian projects T2 - Proceedings of the 5th CLARIN Annual Conference. Aix-en-Provence, France. 26–28 October, 2016 N2 - The paper presents best practices and results from projects in four countries dedicated to the creation of corpora of computer-mediated communication and social media interactions (CMC). Even though there are still many open issues related to building and annotating corpora of that type, there already exists a range of accessible solutions which have been tested in projects and which may serve as a starting point for a more precise discussion of how future standards for CMC corpora may (and should) be shaped like. KW - Computerunterstützte Kommunikation KW - Korpus KW - computer-mediated communication (CMC) KW - social media interaction Y1 - 2016 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-58053 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-58053 UR - https://www.clarin.eu/news/call-papers-clarin-annual-conference-2016 SP - 5 S1 - 5 PB - CLARIN CY - Utrecht ER - TY - JOUR U1 - Zeitschriftenartikel, wissenschaftlich - begutachtet (reviewed) A1 - Krome, Sabine T1 - Variantenschreibungen bei Fremdwörtern: Darstellung und Begründung. Empirische Schreibbeobachtungen auf der Grundlage korpusbasierter Lexikographie JF - Mitteilungen des Deutschen Germanistenverbandes KW - Fremdwort KW - Deutsch KW - Rechtschreibung KW - Korpus Y1 - 2011 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-58089 SN - 0418-9426 SS - 0418-9426 U6 - https://dx.doi.org/10.14220/mdge.2011.58.1.36 DO - https://dx.doi.org/10.14220/mdge.2011.58.1.36 VL - 58 IS - 1 SP - 36 EP - 50 PB - V&R unipress CY - Göttingen ER - TY - CHAP U1 - Buchbeitrag A1 - Kallmeyer, Werner ED - Kallmeyer, Werner ED - Zifonun, Gisela T1 - Möglichkeiten der maschinellen Unterstützung bei der Arbeit mit Interaktionskorpora T2 - Sprachkorpora – Datenmengen und Erkenntnisfortschritt N2 - ln diesem Beitrag sollen anhand von Materialien aus Gesprächskorpora des IDS Schwierigkeiten und Möglichkeiten der maschinellen Recherche vorgeführt werden. Grundlage dafür sind Gesprächstranskripte, die in digitaler Form vorliegen und in einem System mit Rechercheprozeduren zugreifbar sind. Mit diesem Ziel wird auf Rechercheverfahren zurückgegriffen, die in den 1990er Jahren in einem Projekt SHRGF.S im IDS als Anwendung der COSMAS-Technologie auf Gesprächskorpora entwickelt wurden. Die hier gegebenen Recherchemöglichkeiten werden an einem Auswahlkorpus von Gesprächstranskripten mit einem Gesamtumfang von 87.629 laufenden Wörtern versuchsweise angewendet und in ihren Beschränkungen und ihrer Fruchtbarkeit für explorative Untersuchungen betrachtet. Damit soll ein Beitrag zur Klärung der Frage geleistet werden, welche Recherchemöglichkeiten aus einer gesprächsanalytischen Perspektive vorstellbar und erwünscht sind und insofern bei der weiteren korpustechnologischen Entwicklung berücksichtigt werden sollten. T3 - Jahrbuch / Institut für Deutsche Sprache - 2006 KW - Gesprochene Sprache KW - Korpus KW - Information Retrieval Y1 - 2007 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-56120 SN - 978-3-11-019273-5 SB - 978-3-11-019273-5 U6 - https://dx.doi.org/10.1515/9783110439083-012 DO - https://dx.doi.org/10.1515/9783110439083-012 SP - 203 EP - 234 PB - de Gruyter CY - Berlin, New York ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Fauth, Camille A1 - Bonneau, Anne A1 - Zimmerer, Frank A1 - Trouvain, Jürgen A1 - Andreeva, Bistra A1 - Colotte, Vincent A1 - Fohr, Dominique A1 - Jouvet, Denis A1 - Jügler, Jeanin A1 - Laprie, Yves A1 - Mella, Odile A1 - Möbius, Bernd ED - Calzolari, Nicoletta ED - Choukri, Khalid ED - Declerck, Thierry ED - Loftsson, Hrafn ED - Maegaard, Bente ED - Mariani, Joseph ED - Moreno, Asuncion ED - Odijk, Jan ED - Piperidis, Stelios T1 - Designing a Bilingual Speech Corpus for French and German Language Learners: a Two-Step Process T2 - Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). May 26-31, 2014. Harpa Concert Hall and Conference Center. Reykjavik, Iceland N2 - We present the design of a corpus of native and non-native speech for the language pair French-German, with a special emphasis on phonetic and prosodic aspects. To our knowledge there is no suitable corpus, in terms of size and coverage, currently available for the target language pair. To select the target L1-L2 interference phenomena we prepare a small preliminary corpus (corpus1), which is analyzed for coverage and cross-checked jointly by French and German experts. Based on this analysis, target phenomena on the phonetic and phonological level are selected on the basis of the expected degree of deviation from the native performance and the frequency of occurrence. 14 speakers performed both L2 (either French or German) and L1 material (either German or French). This allowed us to test, recordings duration, recordings material, the performance of our automatic aligner software. Then, we built corpus2 taking into account what we learned about corpus1. The aims are the same but we adapted speech material to avoid too long recording sessions. 100 speakers will be recorded. The corpus (corpus1 and corpus2) will be prepared as a searchable database, available for the scientific community after completion of the project. KW - Deutsch KW - Französisch KW - Korpus KW - Gesprochene Sprache KW - Fremdsprachenlernen KW - speech corpus KW - phonetics KW - language learning KW - Phonetik KW - Prosodie Y1 - 2014 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-59154 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-59154 UR - http://www.lrec-conf.org/proceedings/lrec2014/index.html SN - 978-2-9517408-8-4 SB - 978-2-9517408-8-4 SP - 1477 EP - 1482 PB - European Language Resources Association CY - Paris ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Trouvain, Jürgen A1 - Laprie, Yves A1 - Möbius, Bernd A1 - Andreeva, Bistra A1 - Bonneau, Anne A1 - Colotte, Vincent A1 - Fauth, Camille A1 - Fohr, Dominique A1 - Jouvet, Denis A1 - Mella, Odile A1 - Jügler, Jeanin A1 - Zimmerer, Frank T1 - Designing a bilingual speech corpus for French and German language learners T2 - Proceedings of Corpora and Tools in Linguistics, Languages, and Speech. Strasbourg, France. 3-5 Jul 2013 KW - speech corpus KW - French-German KW - phonetics KW - language learning KW - Deutsch KW - Französisch KW - Korpus KW - Fremdsprachenlernen KW - Phonetik Y1 - 2013 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-59164 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-59164 SP - 32 EP - 34 PB - Université de Strasbourg CY - Strasbourg ER - TY - CHAP U1 - Buchbeitrag A1 - Bodmer Mory, Franck A1 - Schmidt, Rudolf ED - Mehler, Alexander ED - Lobin, Henning T1 - Computertechnische Erschließung von Gesprächskorpora T2 - Automatische Textanalyse. Systeme und Methoden zur Annotation und Analyse natürlichsprachlicher Texte N2 - Um gesprochene Sprache leichter analysieren zu können, müssen zuvor die auf Audio- oder Videokassetten befindlichen Aufnahmen transkribiert werden. Dabei kommt der Darstellung von Synchronität des Gesprochenen z.B. in Partiturschreibweise und dem Annotieren von Situationen, Verhalten einzelner Diskursteilnehmer u.dgl. eine bedeutende Rolle zu. Die Vielfalt der transkribierten Details und Informationsebenen setzt ein differenziertes Kodierungsschema voraus. Des Weiteren besteht bei der Gesprächsanalyse der Wunsch, neben dem Auffinden bestimmter Stellen im Schriftmaterial (Transkript) auch deren akustisches Ereignis wiedergeben zu können, was die Synchronisation von Text und Aufnahme voraussetzt. Im Folgenden wird nach einer Einleitung, welche die Geschichte und Motive für die in diesem Papier beschriebenen Komponenten kurz darstellt, eine Zusammenfassung linguistischer Desiderate für die Erschließung von Gesprächskorpora präsentiert und im Anschluss daran ein Modell für Diskurstranskripte vorgestellt, das die technische Grundlage für die diskursanalytische Erschließung von Gesprächskorpora am Institut für Deutsche Sprache (IDS) durch den Computer bildet. Anschließend wird der technische Prozess der Korpuserstellung skizziert, gefolgt von der Beschreibung dreier dabei zum Einsatz kommenden Werkzeuge, des DIDA-Editors, des SPRAT-Alignment- Systems und des DMM-Konverters. Schließlich wird die Volltextdatenbank COSMAS II vorgestellt, mit der die Analyse in den resultierenden SGML-Diskurstranskripten durchgeführt wird. Im Mittelpunkt steht dabei die Fähigkeit von COSMAS II, mit Hilfe der aus der Diskursstruktur abgeleiteten Diskursmetrik eine breite Palette von Suchanfragen zu ermöglichen und sie mit Hilfe der grafischen Suchanfragekomponente als SGML-Suchanfragen zu formulieren. Abschließend wird kurz auf die geplante Weiterentwicklung eingegangen. KW - Textanalyse KW - Korpus KW - Transkription KW - Linguistische Informationswissenschaft Y1 - 2004 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-58825 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-58825 SN - 3-531-14181-3 SB - 3-531-14181-3 SP - 167 EP - 183 PB - VS Verlag CY - Wiesbaden ER - TY - CHAP U1 - Buchbeitrag A1 - Bodmer Mory, Franck T1 - Abfragekomponente von COSMAS-II T2 - LDV-Info 1996 T3 - LDV-Info - 8 KW - Korpus KW - Linguistische Informationswissenschaft KW - COSMAS-II (COSMAS 2) Y1 - 1996 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-58846 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-58846 SN - 3-922641-41-5 SB - 3-922641-41-5 IS - 8 SP - 112 EP - 122 PB - Institut für Deutsche Sprache CY - Mannheim ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Trouvain, Jürgen A1 - Bonneau, Anne A1 - Colotte, Vincent A1 - Fauth, Camille A1 - Fohr, Dominique A1 - Jouvet, Denis A1 - Jügler, Jeanin A1 - Laprie, Yves A1 - Mella, Odile A1 - Möbius, Bernd A1 - Zimmerer, Frank ED - Calzolari, Nicoletta ED - Choukri, Khalid ED - Declerck, Thierry ED - Goggi, Sara ED - Grobelnik, Marko ED - Maegaard, Bente ED - Mariani, Joseph ED - Mazo, Hélène ED - Moreno, Asunción ED - Odijk, Jan ED - Piperidis, Stelios T1 - The IFCASL Corpus of French and German Non-native and Native Read Speech T2 - Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016). May 23-28, 2016. Portorož, Slovenia N2 - The IFCASL corpus is a French-German bilingual phonetic learner corpus designed, recorded and annotated in a project on individualized feedback in computer-assisted spoken language learning. The motivation for setting up this corpus was that there is no phonetically annotated and segmented corpus for this language pair of comparable of size and coverage. In contrast to most learner corpora, the IFCASL corpus incorporate data for a language pair in both directions, i.e. in our case French learners of German, and German learners of French. In addition, the corpus is complemented by two sub-corpora of native speech by the same speakers. The corpus provides spoken data by about 100 speakers with comparable productions, annotated and segmented on the word and the phone level, with more than 50% manually corrected data. The paper reports on inter-annotator agreement and the optimization of the acoustic models for forced speech-text alignment in exercises for computer-assisted pronunciation training. Example studies based on the corpus data with a phonetic focus include topics such as the realization of /h/ and glottal stop, final devoicing of obstruents, vowel quantity and quality, pitch range, and tempo. KW - learner corpus KW - phonetics KW - French KW - German KW - non-native speech KW - Deutsch als Fremdsprache KW - Französisch KW - Korpus KW - Phonetik KW - Gesprochene Sprache KW - native speech KW - multilinguality KW - phonetic databases Y1 - 2016 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-59057 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-59057 UR - http://www.lrec-conf.org/proceedings/lrec2016/index.html SN - 978-2-9517408-9-1 SB - 978-2-9517408-9-1 SP - 1333 EP - 1338 PB - European Language Resources Association CY - Paris ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Graën, Johannes A1 - Batinić, Dolores A1 - Volk, Martin ED - Ruppenhofer, Josef ED - Faaß, Gertrud T1 - Cleaning the Europarl Corpus for Linguistic Applications T2 - Proceedings of the 12th Edition of the KONVENS Conference Vol. 1. Hildesheim, Germany. October 8 – 10, 2014 N2 - We discovered several recurring errors in the current version of the Europarl Corpus originating both from the web site of the European Parliament and the corpus compilation based thereon. The most frequent error was incompletely extracted metadata leaving non-textual fragments within the textual parts of the corpus files. This is, on average, the case for every second speaker change. We not only cleaned the Europarl Corpus by correcting several kinds of errors, but also aligned the speakers’ contributions of all available languages and compiled every- thing into a new XML-structured corpus. This facilitates a more sophisticated selection of data, e.g. querying the corpus for speeches by speakers of a particular political group or in particular language combinations. KW - corpus linguistics KW - Computerlinguistik KW - Korpus Y1 - 2014 U6 - http://nbn-resolving.de/urn/resolver.pl?http://nbn-resolving.de/urn:nbn:de:gbv:hil2-opus-2857 UN - http://nbn-resolving.de/urn/resolver.pl?http://nbn-resolving.de/urn:nbn:de:gbv:hil2-opus-2857 SN - 978-3-934105-46-1 SB - 978-3-934105-46-1 SP - 222 EP - 227 PB - Universitätsverlag Hildesheim CY - Hildesheim ER - TY - CHAP U1 - Buchbeitrag A1 - Batinić, Dolores A1 - Birzer, Sandra A1 - Zinsmeister, Heike ED - Dipper, Stefanie ED - Neubarth, Friedrich ED - Zinsmeister, Heike T1 - Creating an extensible, levelled study corpus of Russian T2 - Proceedings of the 13th Conference on Natural Language Processing (KONVENS) Bochum, Germany September 19–21, 2016 N2 - In this paper, we present first results of training a classifier for discriminating Russian texts into different levels of difficulty. For the classification we considered both surface-oriented features adopted from readability assessments and more linguistically informed, positional features to classify texts into two levels of difficulty. This text classification is the main focus of our Levelled Study Corpus of Russian (LeStCoR), in which we aim to build a corpus adapted for language learning purposes – selecting simpler texts for beginner second language learners and more complex texts for advanced learners. The most discriminative feature in our pilot study was a lexical feature that approximates accessibility of the vocabulary by the second language learner in terms of the proportion of familiar words in the texts. The best feature setting achieved an accuracy of 0.91 on a pilot corpus of 209 texts. T3 - Bochumer Linguistische Arbeitsberichte - 16 KW - Russisch KW - Korpus KW - Levelled Study Corpus of Russian (LeStCoR) Y1 - 2016 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-59235 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-59235 UR - https://www.linguistics.rub.de/bla/ SN - 2190-0949 SS - 2190-0949 SP - 38 EP - 43 PB - Ruhr-Universität Bochum CY - Bochum ER - TY - CHAP U1 - Buchbeitrag A1 - Münzberg, Franziska ED - Konopka, Marek ED - Kubczak, Jacqueline ED - Mair, Christian ED - Šticha, František ED - Waßner, Ulrich Hermann T1 - Korpusrecherche in der Dudenredaktion: Ein Werkstattbericht T2 - Grammatik und Korpora 2009. Dritte Internationale Konferenz. Mannheim, 22.-24.09.2009 T2 - Grammar & Corpora 2009. Thrid International Conference. Mannheim, 22.-24.09.2009 N2 - Thema des Beitrags ist der Einsatz des Dudenkorpus in der Zusammenarbeit von Grammatikautoren und Dudenredaktion. Das annotierte Korpus und die Recherchemöglichkeiten, die es bietet, werden anhand aktueller Beispiele aus der Werkstatt einer Dudenredakteurin beschrieben. Einen Schwerpunkt bildet neben einfachen Vergleichen zwischen zwei oder drei morphologischen Varianten die komplexere Frage, ob temporales wo (der Zeitpunkt, wo; jetzt, wo) in der Dudengrammatik weiterhin als standardsprachlich bezeichnet werden soll. Zugleich wird versucht, die Attraktivität alternativer Konstruktionen (der Zeitpunkt, zu dem; jetzt, da) für Schreibende und Lesende zu messen. Diese ‘Alternativen’ verhalten sich jedoch keineswegs wie die eingangs erwähnten morphologischen Varianten zueinander – zu unterschiedlich sind semantische und syntaktische Leistungen, zu unterschiedlich die Restriktionen, die für ihre Verwendung im Satz gelten, zu unterschiedlich sind schließlich die untersuchten Texte, aus denen die mittels Hochrechnung ausgewerteten über 30 000 Sätze stammen. Zur Diskussion steht, welche Konsequenzen in einer Grammatik für ein breites Publikum zu ziehen sind. Diese Frage wird für die ‘Wortgrammatik’ anders beantwortet als für die ‘Regelgrammatik’. T3 - Korpuslinguistik und interdisziplinäre Perspektiven auf Sprache - 1 KW - Grammatik KW - Korpus KW - Tempus KW - Sprachvariante Y1 - 2011 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-54663 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-54663 SN - 978-3-8233-6648-5 SB - 978-3-8233-6648-5 SP - 181 EP - 197 PB - Narr CY - Tübingen ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Fankhauser, Peter A1 - Knappen, Jörg A1 - Teich, Elke ED - Eder, Maciej ED - Rybick, Jan T1 - Topical Diversification Over Time In The Royal Society Corpus T2 - Digital Humanities 2016. Conference Abstracts. Jagiellonian University and Pedagogical University, Kraków 11–16 July 2016 KW - topic models KW - historical corpora KW - history of science KW - information theory KW - Korpus KW - Textlinguistik KW - Sprachgeschichte KW - Informationstheorie Y1 - 2016 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-54745 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-54745 UR - http://dh2016.adho.org/abstracts/322 SN - 978–83–942760–3–4 SB - 978–83–942760–3–4 SP - 10 S1 - 10 PB - Jagiellonian University; Pedagogical University CY - Kraków ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Rehbein, Ines A1 - van Genabith, Josef ED - De Smedt, Koenraad ED - Hajič, Jan ED - Kübler, Sandra T1 - Why is it so difficult to compare treebanks? TIGER and TüBa-D/Z revisited T2 - The Sixth International Workshop on Treebanks and Linguistic Theories (TLT ‘07). Bergen, Norway. December 7–8, 2007 N2 - This paper is a contribution to the ongoing discussion on treebank annotation schemes and their impact on PCFG parsing results. We provide a thorough comparison of two German treebanks: the TIGER treebank and the TüBa-D/Z. We use simple statistics on sentence length and vocabulary size, and more refined methods such as perplexity and its correlation with PCFG parsing results, as well as a Principal Components Analysis. Finally we present a qualitative evaluation of a set of 100 sentences from the TüBa- D/Z, manually annotated in the TIGER as well as in the TüBa-D/Z annotation scheme, and show that even the existence of a parallel subcorpus does not support a straightforward and easy comparison of both annotation schemes. T3 - NEALT Proceedings Series - 1 KW - Korpus KW - Syntaktische Analyse KW - Annotation KW - treebanks Y1 - 2007 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-57822 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-57822 UR - http://doras.dcu.ie/15264/ SN - 1736-6305 SS - 1736-6305 SP - 115 EP - 126 PB - Northern European Association for Language Technology CY - Tartu ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Rehbein, Ines A1 - van Genabith, Josef ED - Butt, Miriam ED - King, Holoway T1 - Automatic acquisition of LFG resources for German - as good as it gets T2 - Proceedings of the 14th International Lexical Functional Grammar Conference (LFG 2009). Cambridge, United Kingdom. July 13 - 16, 2009 N2 - We present data-driven methods for the acquisition of LFG resources from two German treebanks. We discuss problems specific to semi-free word order languages as well as problems arising from the data structures determined by the design of the different treebanks. We compare two ways of encoding semi-free word order, as done in the two German treebanks, and argue that the design of the TiGer treebank is more adequate for the acquisition of LFG resources. Furthermore, we describe an architecture for LFG grammar acquisition for German, based on the two German treebanks, and compare our results with a hand-crafted German LFG grammar. KW - Lexikalisch funktionale Grammatik KW - Korpus KW - Deutsch KW - Lexical functional grammar KW - German KW - Machine translating Y1 - 2009 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-57504 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-57504 SP - 21 S1 - 21 PB - CSLI Publications CY - Stanford ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Kübler, Sandra A1 - Maier, Wolfgang A1 - Rehbein, Ines A1 - Versley, Yannick ED - Calzolari, Nicoletta ED - Choukri, Khalid ED - Maegaard, Bente ED - Mariani, Joseph ED - Odijk, Jan ED - Piperidis, Stelios ED - Tapias, Daniel T1 - How to Compare Treebanks T2 - Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC-2008), May, 28-30, 2008. Marrakech, Marocco N2 - Recent years have seen an increasing interest in developing standards for linguistic annotation, with a focus on the interoperability of the resources. This effort, however, requires a profound knowledge of the advantages and disadvantages of linguistic annotation schemes in order to avoid importing the flaws and weaknesses of existing encoding schemes into the new standards. This paper addresses the question how to compare syntactically annotated corpora and gain insights into the usefulness of specific design decisions. We present an exhaustive evaluation of two German treebanks with crucially different encoding schemes. We evaluate three different parsers trained on the two treebanks and compare results using EVALB, the Leaf-Ancestor metric, and a dependency-based evaluation. Furthermore, we present TePaCoC, a new testsuite for the evaluation of parsers on complex German grammatical constructions. The testsuite provides a well thought-out error classification, which enables us to compare parser output for parsers trained on treebanks with different encoding schemes and provides interesting insights into the impact of treebank annotation schemes on specific constructions like PP attachment or non-constituent coordination. KW - Korpus KW - Syntaktische Analyse KW - Evaluation methodologies KW - Parsing Systems KW - Syntax Y1 - 2008 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-57520 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-57520 UR - http://www.lrec-conf.org/proceedings/lrec2008/ SN - 2-9517408-4-0 SB - 2-9517408-4-0 SP - 2322 EP - 2329 PB - European Language Resources Association CY - Paris ER - TY - THES U1 - Dissertation / Habilitation A1 - Rehbein, Ines T1 - Treebank-Based Grammar Acquisition for German N2 - Manual development of deep linguistic resources is time-consuming and costly and therefore often described as a bottleneck for traditional rule-based NLP. In my PhD thesis I present a treebank-based method for the automatic acquisition of LFG resources for German. The method automatically creates deep and rich linguistic presentations from labelled data (treebanks) and can be applied to large data sets. My research is based on and substantially extends previous work on automatically acquiring wide-coverage, deep, constraint-based grammatical resources from the English Penn-II treebank (Cahill et al.,2002; Burke et al., 2004; Cahill, 2004). Best results for English show a dependency f-score of 82.73% (Cahill et al., 2008) against the PARC 700 dependency bank, outperforming the best hand-crafted grammar of Kaplan et al. (2004). Preliminary work has been carried out to test the approach on languages other than English, providing proof of concept for the applicability of the method (Cahill et al., 2003; Cahill, 2004; Cahill et al., 2005). While first results have been promising, a number of important research questions have been raised. The original approach presented first in Cahill et al. (2002) is strongly tailored to English and the datastructures provided by the Penn-II treebank (Marcus et al., 1993). English is configurational and rather poor in inflectional forms. German, by contrast, features semi-free word order and a much richer morphology. Furthermore, treebanks for German differ considerably from the Penn-II treebank as regards data structures and encoding schemes underlying the grammar acquisition task. In my thesis I examine the impact of language-specific properties of German as well as linguistically motivated treebank design decisions on PCFG parsing and LFG grammar acquisition. I present experiments investigating the influence of treebank design on PCFG parsing and show which type of representations are useful for the PCFG and LFG grammar acquisition tasks. Furthermore, I present a novel approach to cross-treebank comparison, measuring the effect of controlled error insertion on treebank trees and parser output from different treebanks. I complement the cross-treebank comparison by providing a human evaluation using TePaCoC, a new testsuite for testing parser performance on complex grammatical constructions. Manual evaluation on TePaCoC data provides new insights on the impact of flat vs. hierarchical annotation schemes on data-driven parsing. I present treebank-based LFG acquisition methodologies for two German treebanks. An extensive evaluation along different dimensions complements the investigation and provides valuable insights for the future development of treebanks. KW - German KW - treebanks KW - lexical-functional grammar KW - LFG KW - parsing KW - PCFG KW - grammar acquisistion KW - Korpus KW - Syntaktische Analyse Y2 - 2009 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:hebis:30:3-330238 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:hebis:30:3-330238 N1 - This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License: http://doras.dcu.ie/licenses/ccancnd3_0/ SP - 249 S1 - 249 PB - Dublin City University CY - Dublin ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Rehbein, Ines A1 - van Genabith, Josef ED - Nivre, Joakim ED - Kaalep, Heiki-Jaan ED - Muischnek, Kadri ED - Koit, Mare T1 - Evaluating Evaluation Measures T2 - Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA-2007). University of Tartu, Tartu. May 24-26, 2007 N2 - This paper presents a thorough examination of the validity of three evaluation measures on parser output. We assess parser performance of an unlexicalised probabilistic parser trained on two German treebanks with different annotation schemes and evaluate parsing results using the PARSEVAL metric, the Leaf-Ancestor metric and a dependency-based evaluation. We reject the claim that the TüBa-D/Z annotation scheme is more adequate then the TIGER scheme for PCFG parsing and show that PARSEVAL should not be used to compare parser performance for parsers trained on treebanks with different annotation schemes. An analysis of specific error types indicates that the dependency-based evaluation is most appropriate to reflect parse quality. KW - Korpus KW - Syntaktische Analyse KW - Deutsch Y1 - 2007 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-57543 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-57543 SN - 978-9985-4-0513-0 SB - 978-9985-4-0513-0 SP - 372 EP - 379 PB - University of Tartu CY - Tartu ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Rehbein, Ines A1 - van Genabith, Josef T1 - Treebank Annotation Schemes and Parser Evaluation for German T2 - Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Prague, Czech Republic. June 28-30, 2007 N2 - Recent studies focussed on the question whether less-configurational languages like German are harder to parse than English, or whether the lower parsing scores are an artefact of treebank encoding schemes and data structures, as claimed by Kübler et al. (2006). This claim is based on the assumption that PARSEVAL metrics fully reflect parse quality across treebank encoding schemes. In this paper we present new experiments to test this claim. We use the PARSEVAL metric, the Leaf-Ancestor metric as well as a dependency-based evaluation, and present novel approaches measuring the effect of controlled error insertion on treebank trees and parser output. We also provide extensive past-parsing crosstreebank conversion. The results of the experiments show that, contrary to Kübler et al. (2006), the question whether or not German is harder to parse than English remains undecided. KW - Korpus KW - Syntaktische Analyse KW - parser evaluation KW - Annotation Y1 - 2007 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-57551 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-57551 UR - http://www.aclweb.org/anthology/D/D07/D07-1066 SP - 630 EP - 639 PB - Association for Computational Linguistics CY - Stroudsburg, PA ER - TY - CHAP U1 - Buchbeitrag A1 - Trawiński, Beata ED - Ziková, Markéta ED - Dočekal, Mojmír T1 - AND-Type versus WITH-Type Conjunctions: Towards a Corpus-Based Study T2 - Slavic Languages in Formal Grammar. Proceedings of FDSL 8.5, Brno 2010 T3 - Linguistik international - 26 KW - Korpus KW - Polnisch KW - Slawistik KW - Grammatik Y1 - 2012 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-58209 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-58209 SN - 978-3-631-63609-1 SB - 978-3-631-63609-1 SP - 221 EP - 232 PB - Lang CY - Frankfurt am Main/Berlin/Bern/Bruxelles/New York/Oxford/Wien ER - TY - CHAP U1 - Buchbeitrag A1 - Herberg, Dieter ED - Teubert, Wolfgang T1 - Neues im Wortgebrauch der Wendezeit. Zur Arbeit mit dem IDS-Wendekorpus T2 - Neologie und Korpus N2 - Dieser Beitrag nimmt Bezug auf ein lexikologisches Arbeitsprojekt des Instituts für deutsche Sprache (Mannheim) und will einen Einblick in die Voraussetzungen und Ziele dieses Vorhabens sowie in die Arbeitsweise der Projektmitarbeiter geben. Dabei soll Aspekten der Korpus- und Computernutzung in den einzelnen Arbeitsetappen besondere Aufmerksamkeit gelten. T3 - Studien zur deutschen Sprache - 11 KW - Wiedervereinigung KW - Geschichte 1989-1990 KW - Wortschatz KW - Korpus KW - Forschungsprojekt Y1 - 1998 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-58246 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-58246 SN - 3-8233-5141-9 SB - 3-8233-5141-9 SP - 43 EP - 61 PB - Narr CY - Tübingen ER - TY - JOUR U1 - Zeitschriftenartikel, wissenschaftlich - begutachtet (reviewed) A1 - Kallmeyer, Werner T1 - Wo bleibt der Kontext? Zur computerunterstützten Arbeit mit ethnographischen Korpora JF - Zeitschrift für Literaturwissenschaft und Linguistik KW - Gesprochene Sprache KW - Korpus KW - Ethnolinguistik Y1 - 1993 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-56584 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-56584 SN - 0049-8653 SS - 0049-8653 VL - 23 SP - 88 EP - 103 PB - Vandenhoeck & Ruprecht CY - Göttingen ER - TY - CHAP U1 - Buchbeitrag A1 - Krome, Sabine ED - Kratochvílová, Iva ED - Wolf, Norbert Richard T1 - Digitale Datenflut: Chancen und Tücken eines Textkorpus zur deutschen Gegenwartssprache. Anforderungsprofil, Methoden und Instrumentarien zur Beobachtung des aktuellen Sprach- und Schreibgebrauchs T2 - Grundlagen einer sprachwissenschaftlichen Quellenkunde T3 - Studien zur deutschen Sprache - 66 KW - Korpus KW - Wörterbuch KW - Wahrig, Gerhard Y1 - 2013 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-57426 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-57426 SN - 978-3-8233-6836-6 SB - 978-3-8233-6836-6 SP - 49 EP - 66 PB - Narr CY - Tübingen ER - TY - CHAP U1 - Buchbeitrag A1 - Krome, Sabine ED - Kratochvílová, Iva ED - Wolf, Norbert Richard T1 - Die deutsche Gegenwartssprache im Fokus korpusbasierter Lexikographie. Korpora als Grundlage moderner allgemeinsprachlicher Wörterbücher am Beispiel des WAHRIG Textkorpus digital T2 - Kompendium Korpuslinguistik. Eine Bestandsaufnahme aus deutsch-tschechischer Perspektive T3 - Germanistische Bibliothek - 38 KW - Lexikographie KW - Korpus KW - Wörterbuch KW - Wahrig, Gerhard Y1 - 2010 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-57435 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-57435 SN - 978-3-8253-5793-1 SB - 978-3-8253-5793-1 SP - 117 EP - 134 PB - Winter CY - Heidelberg ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Geumann, Anja T1 - Towards a new level of annotation detail of multilingual speech corpora T2 - Proceedings of the 8th International Conference of Spoken Language Processing, Interspeech, Jeju, South Korea, 2004 N2 - The aim of this paper is to highlight the actual need for corpora that have been annotated based on acoustic information. The acoustic information should be coded in features or properties and is needed to inform further processing systems, i.e. to present a basis for a speech recognition system using linguistic information. Feature annotation of existing corpora in combination with segmental annotation can provide a powerful training material for speech recognition systems, but will as well challenge the further processing of features to segments and syllables. We present here the theoretical preliminaries for our multilingual feature extraction system, that we are currently working on. KW - Automatische Spracherkennung KW - Phonetik KW - Annotation KW - Korpus KW - Gesprochene Sprache Y1 - 2004 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-57020 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-57020 UR - http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.385.5197&rep=rep1&type=pdf SP - 1096 EP - 1099 ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Fischer, Peter M. A1 - Diewald, Nils A1 - Kupietz, Marc A1 - Witt, Andreas T1 - Aufbau einer Korpusinfrastruktur für die Beobachtung des Schreibgebrauchs T2 - DHd 2016. Modellierung - Vernetzung - Visualisierung. Die Digital Humanities als fächerübergreifendes Forschungsparadigma. Konferenzabstracts. Universität Leipzig 7. bis 12. März 2016 KW - Korpus KW - Computerlinguistik KW - Rechtschreibung Y1 - 2016 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-56811 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-56811 UR - http://dhd2016.de/boa.pdf SN - 978-3-941379-05-3 SB - 978-3-941379-05-3 SP - 310 EP - 312 PB - Nisaba CY - Duisburg ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Schmidt, Thomas T1 - Datenbank für Gesprochenes Deutsch (DGD) T2 - DHd 2016. Modellierung - Vernetzung - Visualisierung. Die Digital Humanities als fächerübergreifendes Forschungsparadigma. Konferenzabstracts. Universität Leipzig 7. bis 12. März 2016 KW - Korpus KW - Datenbank KW - Gesprochene Sprache Y1 - 2016 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-56837 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-56837 UR - http://dhd2016.de/boa.pdf SN - 978-3-941379-05-3 SB - 978-3-941379-05-3 SP - 364 EP - 365 PB - Nisaba CY - Duisburg ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Fankhauser, Peter T1 - Kuration und Exploration des Korpus "Diskurs in der Weimarer Republik" T2 - DHd 2016. Modellierung - Vernetzung - Visualisierung. Die Digital Humanities als fächerübergreifendes Forschungsparadigma. Konferenzabstracts. Universität Leipzig 7. bis 12. März 2016 KW - Langzeitarchivierung KW - Korpus Y1 - 2016 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-56870 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-56870 UR - http://dhd2016.de/boa.pdf SN - 978-3-941379-05-3 SB - 978-3-941379-05-3 SP - 306 EP - 308 PB - Nisaba CY - Duisburg ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Trawiński, Beata ED - Arsenijevic, Boban ED - Baldwin, Timothy ED - Trawiński, Beata T1 - A Quantitative Approach to Preposition-Pronoun Contraction in Polish T2 - Proceedings of the Third ACL-SIGSEM Workshop on Prepositions, 3 April, 2006, Trento, Italy N2 - This paper presents the current results of an ongoing research project on corpus distribution of prepositions and pronouns within Polish preposition-pronoun contractions. The goal of the project is to provide a quantitative description of Polish preposition-pronoun contractions taking into consideration morphosyntactic properties of their components. It is expected that the results will provide a basis for a revision of the traditionally assumed inflectional paradigms of Polish pronouns and, thus, for a possible remodeling of these paradigms. The results of corpus-based investigations of the distribution of prepositions within preposition-pronoun contractions can be used for grammar-theoretical and lexicographic purposes. KW - Polnisch KW - Präposition KW - Pronomen KW - Korpus KW - Sprachstatistik Y1 - 2006 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-52889 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-52889 UR - http://dl.acm.org/citation.cfm?id=1621431&picked=prox SP - 17 EP - 22 S1 - 17 PB - Association for Computational Linguistics CY - Stroudsburg ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Trawiński, Beata T1 - Using Corpus Statistics in the Modeling of Linguistic Paradigms T2 - International Conference on Linguistic Evidence. Empirical, Theoretical and Computational Perspectives N2 - This paper presents how corpus statistics can be used to verify complex inflectional paradigms. This will be demonstrated using a set of traditionally assumed inflectional paradigms of third person personal pronouns in Polish. KW - Korpus KW - Sprachstatistik KW - Polnisch KW - Pronomen Y1 - 2006 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-52909 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-52909 SP - 195 EP - 197 S1 - 3 PB - University of Tübingen CY - Tübingen ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Schäfer, Roland A1 - Bildhauer, Felix T1 - Automatic Classification by Topic Domain for Meta Data Generation, Web Corpus Evaluation, and Corpus Comparison T2 - Proceedings of the 10th Web as Corpus Workshop (WAC-X) and the EmpiriST Shared Task, Berlin,Germany, August 7-12, 2016 N2 - In this paper, we describe preliminary results from an ongoing experiment wherein we classify two large unstructured text corpora—a web corpus and a newspaper corpus—by topic domain (or subject area). Our primary goal is to develop a method that allows for the reliable annotation of large crawled web corpora with meta data required by many corpus linguists. We are especially interested in designing an annotation scheme whose categories are both intuitively interpretable by linguists and firmly rooted in the distribution of lexical material in the documents. Since we use data from a web corpus and a more traditional corpus, we also contribute to the important field of corpus comparison and corpus evaluation. Technically, we use (unsupervised) topic modeling to automatically induce topic distributions over gold standard corpora that were manually annotated for 13 coarse-grained topic domains. In a second step, we apply supervised machine learning to learn the manually annotated topic domains using the previously induced topics as features. We achieve around 70% accuracy in 10-fold cross validations. An analysis of the errors clearly indicates, however, that a revised classification scheme and larger gold standard corpora will likely lead to a substantial increase in accuracy. KW - Textklassifikation KW - Topikmodellierung KW - Korpusvergleich KW - Korpus KW - Textlinguistik KW - Annotation Y1 - 2016 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-52979 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-52979 SP - 1 EP - 6 PB - Association for Computational Linguistics CY - Berlin ER - TY - BOOK U1 - Buch ED - Bański, Piotr ED - Kupietz, Marc ED - Lüngen, Harald ED - Witt, Andreas ED - Barbaresi, Adrien ED - Biber, Hanno ED - Breiteneder, Evelyn ED - Clematide, Simon T1 - 4th Workshop on Challenges in the Management of Large Corpora. (May 28th 2016, Portorož; part of the LREC-2016 workshop structure) / LREC 2016, CMLC-4. KW - Korpus Y1 - 2016 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-55493 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-55493 UR - http://www.lrec-conf.org/proceedings/lrec2016/workshops/LREC2016Workshop-CMLC_Proceedings.pdf SP - 39 S1 - 39 CY - Portorož, Slovenia ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Steyer, Kathrin A1 - Hein, Katrin ED - Margalitadze, Tinatin ED - Meladze, George T1 - Nach Belieben kombinieren? Korpusbasierte Beschreibung präpositionaler Mehrworteinheiten im Sprachvergleich T2 - Proceedings of the XVII EURALEX International Congress: Lexicography and Linguistic Diversity, 6-10 September, 2016, Tbilisi, Georgia N2 - Präposition-Substantiv-Verbindungen mit rekurrentem Nullartikel in adverbialer Verwendung – z.B. nach Belieben, auf Knopfdruck, ohne Ende oder bei Nacht – sind ein in der Mehrwortforschung bisher eher vernachlässigter Typ. Sie sind Untersuchungsgegenstand des laufenden Forschungsprojekts „Präpositionale Wortverbindungen kontrastiv“ (beteiligte Institutionen: IDS Mannheim, Universität Santiago de Compostela, Universität Trnava), in das wir in unserem Vortrag einen Einblick vermitteln. Es wird skizziert, wie sich solche Wortverbindungen sowie abstraktere präpositionale Wortverbindungsmuster vom Typ [in + SUBX-Zeit(en) (z.B. in Echtzeit, in Krisenzeiten) aus kontrastiver Sicht (Deutsch – Spanisch – Slowakisch) korpusbasiert untersuchen und lexikografisch beschreiben lassen. Von großem Interesse – gerade auch für Fremdsprachenlerner – sind dabei insbesondere die semantisch-funktionalen Restriktionen, denen solche Entitäten unterliegen. Basierend auf den theoretischen und empirischen Grundannahmen des am IDS entwickelten Modells „Usuelle Wortverbindungen“ (vgl. Steyer 2013) werden im Projekt zunächst Kollokations- und Kotextmuster für die binären deutschen Mehrworteinheiten induktiv in sehr großen Korpora ermittelt; im Anschluss werden sie einem systematischen Vergleich mit dem Spanischen und Slowakischen unterzogen. Methodisch greifen wir – in allen drei Sprachen – u.a. auf Kookkurrenzprofile zu den Wortverbindungen sowie auf Slotanalysen zu definierten Suchmustern zurück. Ziel des Projekts ist u.a. die Entwicklung eines neuartigen Prototyps für eine multilinguale Aufbereitung des Untersuchungsgegentands (speziell für Fremdsprachenlerner). KW - korpusbasierte Phraseologie KW - Kollokationsforschung KW - Corpus Pattern Analysis KW - Gebrauchsbasiertheit KW - Schnittstelle Konstruktionsgrammatik – Phraseologie KW - Äquivalenztheorien KW - Deutsch KW - Phraseologismus KW - Sprachgebrauch Y1 - 2016 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-55557 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-55557 UR - http://euralex2016.tsu.ge/publication.html SN - 978-9941-13-542-2 SB - 978-9941-13-542-2 SP - 402 EP - 408 PB - Ivane Javakhishvili Tbilisi State University CY - Tbilisi ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Rehbein, Ines A1 - Schalowski, Sören A1 - Wiese, Heike ED - Calzolari, Nicoletta ED - Choukri, Khalid ED - Declerck, Thierry ED - Loftsson, Hrafn ED - Maegaard, Bente ED - Mariani, Joseph ED - Moreno, Asuncion ED - Odijk, Jan ED - Piperidis, Stelios T1 - The KiezDeutsch Korpus (KiDKo) Release 1.0 T2 - Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). May 26-31, 2014. Harpa Concert Hall and Conference Center. Reykjavik, Iceland N2 - This paper presents the first release of the KiezDeutsch Korpus (KiDKo), a new language resource with multiparty spoken dialogues of Kiezdeutsch, a newly emerging language variety spoken by adolescents from multi-ethnic urban areas in Germany. The first release of the corpus includes the transcriptions of the data as well as a normalisation layer and part-of-speech annotations. In the paper, we describe the main features of the new resource and then focus on automatic POS tagging of informal spoken language. Our tagger achieves an accuracy of nearly 97% on KiDKo. While we did not succeed in further improving the tagger using ensemble tagging, we present our approach to using the tagger ensembles for identifying error patterns in the automatically tagged data. KW - spoken language corpora KW - urban youth language KW - Kiezdeutsch KW - Gesprochene Sprache KW - Stadtmundart KW - Jugendsprache KW - Multikulturelle Gesellschaft KW - Korpus Y1 - 2014 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-55999 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-55999 UR - www.lrec-conf.org/proceedings/lrec2014/index.html SN - 978-2-9517408-8-4 SB - 978-2-9517408-8-4 SP - 3927 EP - 3934 PB - European Language Resources Association CY - Paris ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Rehbein, Ines A1 - Schalowski, Sören ED - Jancsary, Jeremy T1 - Extending the STTS for the Annotation of Spoken Language T2 - Proceedings of the 11th Edition of the Conference on Natural Language Processing (KONVENS). Vienna, September 19-21, 2012. N2 - This paper presents an extension to the Stuttgart-Tübingen TagSet, the standard part-of-speech tag set for German, for the annotation of spoken language. The additional tags deal with hesitations, backchannel signals, interruptions, onomatopoeia and uninterpretable material. They allow one to capture phenomena specific to spoken language while, at the same time, preserving inter-operability with already existing corpora of written language. T3 - Schriftenreihe der Österreichischen Gesellschaft für Artificial Intelligence (ÖGAI) - 5 KW - Korpus KW - Gesprochene Sprache KW - Annotation KW - Automatische Sprachanalyse KW - Interoperabilität Y1 - 2012 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-56026 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-56026 UR - http://www.oegai.at/konvens2012/proceedings.shtml SN - 3-85027-005-X SB - 3-85027-005-X SP - 238 EP - 242 PB - Eigenverlag ÖGAI CY - Wien ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Beißwenger, Michael A1 - Ehrhardt, Eric A1 - Herold, Axel A1 - Lüngen, Harald A1 - Storrer, Angelika ED - Resch, Claudia ED - Hannesschläger, Vanessa ED - Wissik, Tanja T1 - Converting and Representing Social Media Corpora into TEI: Schema and best practices from CLARIN-D T2 - TEI Conference and Members' Meeting 2016. Book of Abstracts N2 - The paper presents results from a curation project within CLARIN-D, in which an existing lMWord corpus of German chat communication has been integrated into the DEREKO and DWDS corpus infrastructures of the CLARIN-D centres at the Institute for the German Language (IDS, Mannheim) and at the Berlin-Brandenburg Academy of Sciences (BBAW, Berlin). The focus is on the solutions developed for converting and representing the corpus in a TEI format. KW - Deutsch KW - Chatten KW - Korpus Y1 - 2016 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-55736 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-55736 UR - http://tei2016.acdh.oeaw.ac.at/sites/default/files/TEIconf2016_BookOfAbstracts.pdf SN - 978-3-200-04689-4 SB - 978-3-200-04689-4 SP - 39 EP - 41 PB - Austrian Centre for Digital Humanities, Austrian Academy of Sciences CY - Wien ER - TY - CHAP U1 - Buchbeitrag A1 - Lüngen, Harald A1 - Beißwenger, Michael A1 - Ehrhardt, Eric A1 - Herold, Axel A1 - Storrer, Angelika ED - Dipper, Stefanie ED - Neubarth, Friedrich ED - Zinsmeister, Heike T1 - Integrating corpora of computer-mediated communication in CLARIN-D: Results from the curation project ChatCorpus2CLARIN T2 - Proceedings of the 13th Conference on Natural Language Processing (KONVENS) N2 - We introduce our pipeline to integrate CMC and SM corpora into the CLARIN-D corpus infrastructure. The pipeline was developed by transforming an existing CMC corpus, the Dortmund Chat Corpus, into a resource conforming to current technical and legal standards. We describe how the resource has been prepared and restructured in terms of TEI encoding, linguistic annotations, and anonymisation. The output is a CLARIN-conformant resource integrated in the CLARIN-D research infrastructure. T3 - Bochumer Linguistische Arbeitsberichte - 16 KW - Deutsch KW - Chatten KW - Korpus KW - Text Encoding Initiative (TEI) Y1 - 2016 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-55743 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-55743 UR - https://www.linguistics.ruhr-uni-bochum.de/bla/ SN - 2190-0949 SS - 2190-0949 SP - 156 EP - 164 PB - Sprachwissenschaftliches Institut, Ruhr-Universität Bochum CY - Bochum ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Hartmann, Jutta M. A1 - Sauter, Corinna A1 - Schole, Gesa A1 - Wagner, Wiltrud A1 - Gietz, Peter A1 - Winkler, Susanne T1 - TInCAP – ein interdisziplinäres Korpus zu Ambiguitätsphänomenen T2 - DHd 2016. Modellierung - Vernetzung - Visualisierung. Die Digital Humanities als fächerübergreifendes Forschungsparadigma. Konferenzabstracts. Univeristät Leipzig 7. bis 12. März 2016 KW - Ambiguität KW - Korpus Y1 - 2016 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-55764 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-55764 UR - http://dhd2016.de/ SN - 978-3-941379-05-3 SB - 978-3-941379-05-3 SP - 322 EP - 323 PB - Nisaba CY - Duisburg ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Beißwenger, Michael A1 - Herold, Axel A1 - Lüngen, Harald A1 - Storrer, Angelika T1 - Das Dortmunder Chat-Korpus in CLARIN-D: Modellierung und Mehrwerte T2 - DHd 2016. Modellierung - Vernetzung - Visualisierung. Die Digital Humanities als fächerübergreifendes Forschungsparadigma. Konferenzabstracts KW - Deutsch KW - Chatten KW - Korpus KW - CLARIN-D Y1 - 2016 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-55788 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-55788 UR - http://dhd2016.de/boa.pdf SN - 978-3-941379-05-3 SB - 978-3-941379-05-3 SP - 274 EP - 277 PB - nisaba CY - Duisburg ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Beißwenger, Michael A1 - Ehrhardt, Eric A1 - Herold, Axel A1 - Lüngen, Harald A1 - Storrer, Angelika ED - Fišer, Darja ED - Beißwenger, Michael T1 - (Best) Practices for Annotating and Representing CMC and Social Media Corpora in CLARIN-D T2 - Proceedings of the 4th Conference on CMC and Social Media Corpora for the Humanities N2 - The paper reports the results of the curation project ChatCorpus2CLARIN. The goal of the project was to develop a workflow and resources for the integration of an existing chat corpus into the CLARIN-D research infrastructure for language resources and tools in the Humanities and the Social Sciences (http://clarin-d.de). The paper presents an overview of the resources and practices developed in the project, describes the added value of the resource after its integration and discusses, as an outlook, to what extent these practices can be considered best practices which may be useful for the annotation and representation of other CMC and social media corpora. KW - CMC corpora KW - TEI encoding KW - tagging KW - corpus infrastructures KW - legal issues KW - Korpus KW - Chatten KW - Deutsch KW - Text Encoding Initiative (TEI) Y1 - 2016 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-55810 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-55810 UR - http://nl.ijs.si/janes/wp-content/uploads/2016/09/CMC-conference-proceedings-2016.pdf SN - 978-961-237-859-2 SB - 978-961-237-859-2 SP - 7 EP - 11 PB - Academic Publishing Division of the Faculty of Arts of the University of Ljubljana CY - Ljubljana ER - TY - CHAP U1 - Buchbeitrag A1 - Hein, Katrin A1 - Bubenhofer, Noah ED - Ziem, Alexander ED - Lasch, Alexander T1 - Korpuslinguistik konstruktionsgrammatisch. Diskursspezifische n-Gramme zwischen statistischer Signifikanz und semantisch-pragmatischem Mehrwert T2 - Konstruktionsgrammatik IV. Konstruktionen und Konventionen als kognitive Routinen N2 - ln einer korpuspragmatischen Sicht auf Sprachgebrauch werden sogenannte Sprachgebrauchsmuster, die typisch für bestimmte Sprachausschnitte sind, datengeleitet berechnet. Solche Sprachgebrauchsmuster können z.B. diskursanalytisch gedeutet werden; noch relativ unerforscht ist aber ein konstruktionsgrammatischer Blick auf solche Muster. An zwei Beispielen wird gezeigt, wie mit der Berechnung von typischen n-Grammen (auf der Basis von Wortformen, sowie komplexer auf der Basis von Wortformen und Wortartkategorien) Sprachgebrauchsmuster berechnet werden können: Beim ersten Beispiel werden typische Formulierungsmuster in Leserbriefen, beim zweiten Beispiel aus einem politischen Diskurs (Wulff-Affäre), untersucht. Der Beitrag zielt in der Folge darauf ab, diese Muster dem usage-based-approach der KxG folgend als Konstruktionen zu deuten, die soziopragmatischen Verwendungsbedingungen gehorchen. T3 - Stauffenburg Linguistik - 76 KW - Diskursanalyse KW - Konstruktionsgrammatik KW - Korpus KW - Sprachgebrauchsmuster Y1 - 2015 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-44525 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-44525 SN - 978-3-86057-121-7 SB - 978-3-86057-121-7 SP - 179 EP - 206 PB - Stauffenburg CY - Tübingen ER - TY - RPRT U1 - Arbeitspapier A1 - Perkuhn, Rainer A1 - Keibel, Holger ED - Minegishi, Makoto ED - Kawaguchi, Yuji T1 - A brief tutorial on using collocations for uncovering and contrasting meaning potentials of lexical items T2 - Working Papers in Corpus-based Linguistics and Language Education No. 3 N2 - This introductory tutorial describes a strictly corpus-driven approach for uncovering indications for aspects of use of lexical items. These aspects include ‘(lexical) meaning’ in a very broad sense and involve different dimensions, they are established in and emerge from respective discourses. Using data-driven mathematical-statistical methods with minimal (linguistic) premises, a word’s usage spectrum is summarized as a collocation profile. Self-organizing methods are applied to visualize the complex similarity structure spanned by these profiles. These visualizations point to the typical aspects of a word’s use, and to the common and distinctive aspects of any two words. KW - Korpus KW - Kollokation KW - Methode Y1 - 2009 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-47141 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-47141 SP - 77 EP - 91 PB - Tokyo University of Foreign Studies CY - Tokyo ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Perkuhn, Rainer T1 - Systematic Exploration of Collocation Profiles T2 - Proceedings of the 4th Corpus Linguistics Conference (CL 2007) N2 - The central issue in corpus-driven linguistics is the detection and description of patterns in language usage. The features that constitute the notion of a pattern can be computed to a certain extent by statistical (collocation) methods, but a crucial part of the notion may vary depending on applications and users. Thus, typically, any computed collocation cluster will have to be interpreted hermeneutically. Often it might be captured by a generalized, more abstract pattern. We present a generic process model that supports the recognition, interpretation, and expression of the patterns inside and of the relations between clusters. By this, clusters can be merged virtually according to any notion of a 'pattern', and their relations can be exploited for different applications KW - Korpus KW - Kollokation KW - Distribution Y1 - 2007 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-47156 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-47156 SP - 12 S1 - 12 PB - University of Brimingham CY - Brimingham ER - TY - BOOK U1 - Buch A1 - Bubenhofer, Noah A1 - Konopka, Marek A1 - Schneider, Roman T1 - Präliminarien einer Korpusgrammatik N2 - Der korpuslinguistische Ansatz des Projekts »Korpusgrammatik« eröffnet neue Perspektiven auf unsere Sprachwirklichkeit allgemein und grammatische Regularitäten im Besonderen. Der vorliegende Band klärt auf, wie man korpuslinguistisch nach dem Standard fragen kann, wie die Projektkorpora aufgebaut und in einer Korpusdatenbank erschlossen sind, wie man in einem automatischen Abfragesystem der Variabilität der Sprache zu Leibe rückt und sie sogar messbar macht, schließlich aber auch, wo die Grenzen quantitativer Korpusanalysen liegen. Pilotstudien deuten an, wie der Ansatz unsere grammatischen Horizonte erweitert und die Grammatikografie voranbringt. T3 - Korpuslinguistik und interdisziplinäre Perspektiven auf Sprache - 4 KW - Deutsch KW - Standardsprache KW - Grammatik Y1 - 2014 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-49345 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-49345 SN - 978-3-8233-6701-7 SB - 978-3-8233-6701-7 SP - 245 S1 - 245 PB - Narr CY - Tübingen ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Kupietz, Marc A1 - Schonefeld, Oliver A1 - Witt, Andreas ED - Arranz, Victoria ED - Eerten, Laura van T1 - The German Reference Corpus: New developments building on almost 50 years of experience T2 - Language Resources: From Storyboard to Sustainability and LR Lifecycle Management, Workshop held at the seventh conference on International Language Resources and Evaluation (LREC). Malta, May 2010 N2 - This paper describes the efforts in the field of sustainability of the Institut für Deutsche Sprache (IDS) in Mannheim with respect to DEREKO (Deutsches Referenzkorpus) the Archive of General Reference Corpora of Contemporary Written German. With focus on re-usability and sustainability, we discuss its history and our future plans. We describe legal challenges related to the creation of a large and sustainable resource; sketch out the pipeline used to convert raw texts to the final corpus format and outline migration plans to TEI P5. Due to the fact, that the current version of the corpus management and query system is pushed towards its limits, we discuss the requirements for a new version which will be able to handle current and future DEREKO releases. Furthermore, we outline the institute’s plans in the field of digital preservation. KW - Korpus KW - Langzeitarchivierung KW - Institut für Deutsche Sprache KW - Deutsches Referenzkorpus (DeReKo) Y1 - 2010 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45002 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45002 UR - http://lrec-conf.org/proceedings/lrec2010/index.html SP - 39 EP - 43 PB - European Language Resources Association (ELRA) CY - Paris ER - TY - JOUR U1 - Zeitschriftenartikel, wissenschaftlich - begutachtet (reviewed) A1 - Rehm, Georg A1 - Schonefeld, Oliver A1 - Witt, Andreas A1 - Hinrichs, Erhard A1 - Reis, Marga T1 - Sustainability of annotated resources in linguistics: A web-platform for exploring, querying, and distributing linguistic corpora and other resources JF - Literary and Linguistic Computing N2 - We report on finished work in a project that is concerned with providing methods, tools, best practice guidelines, and solutions for sustainable linguistic resources. The article discusses several general aspects of sustainability and introduces an approach to normalizing corpus data and metadata records. Moreover, the architecture of the sustainability platform implemented by the authors is described. KW - Korpus KW - Sprachdaten KW - Langzeitarchivierung KW - Digital Humanities Y1 - 2009 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45029 SN - 1477-4615 SS - 1477-4615 U6 - https://dx.doi.org/10.1093/llc/fqp003 DO - https://dx.doi.org/10.1093/llc/fqp003 N1 - Dieser Beitrag ist mit Zustimmung des Rechteinhabers aufgrund einer (DFG geförderten) Allianz- bzw. Nationallizenz frei zugänglich. This publication is with permission of the rights owner freely accessible due to an Alliance licence and a national licence (funded by the DFG, German Research Foundation) respectively. VL - 24 IS - 2 SP - 193 EP - 210 PB - Oxford University Press CY - Oxford ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Witt, Andreas T1 - Multiple hierarchies: new aspects of an old solution T2 - Proceedings of Extreme Markup Languages 2004 N2 - Overlap in markup occurs where some markup structures do not nest, such as where the structural division of the text into lists, sections, etc., differs from the syntactic division of the text into sentences and phrases. The Multiple Annotation solution to this problem (redundant encoding in multiple forms) has many advantages: it is based on XML, the modeling of alternative annotations is possible, each level can be viewed separately, and new levels can be added at any time. But it has the significant disadvantage of independence of the separate files. These multiply annotated files can be regarded as an interrelated unit, with the text serving as the implicit link. Two representations of the information contained in the multiple files (one in Prolog and one in XML) can be programmatically derived and used together for editing, for inference, or for unification of the multiply annotated documents. KW - Concurrent Markup/Overlap KW - Korpus KW - Auszeichnungssprache Y1 - 2004 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45373 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45373 UR - http://conferences.idealliance.org/extreme/dates.html#2004 SP - 19 S1 - 19 PB - Extreme Markup Languages Conference CY - Montreal ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Sasaki, Felix A1 - Witt, Andreas ED - Lino, Maria Teresa ED - Xavier, Maria Francisca ED - Ferreira, Fátima ED - Costa, Rute ED - Silva, Raquel ED - Pereira, Carla ED - Carvalho, Filipa ED - Lopes, Milene ED - Catarino, Mónica ED - Barros, Sérgio T1 - Co-reference in Japanese Task-oriented Dialogues: A Contribution to the Development of Language-specific and Language-general Annotation Schemes and Resources T2 - Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC-2004). Lissabon, Portugal N2 - This paper describes a corpus of Japanese task-oriented dialogues, i.e. its data, annotations, analysis methodology and preliminary results for the modeling of co-referential phenomena. Current corpus based approaches to co-reference concentrate on textual data from English or other European languages. Hence, the emerging language-general models of co-reference miss input from dialogue data of non-European languages. We aim to fill this gap and contribute to a model of co-reference on various language-specific and language-general levels. KW - Annotation KW - Data Architecture KW - Co-Reference Y1 - 2004 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45416 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45416 UR - http://www.lrec-conf.org/proceedings/lrec2004/ SN - 2-9517408-1-6 SB - 2-9517408-1-6 SP - 655 EP - 658 PB - European Language Resources Association (ELRA) CY - Paris ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Sasaki, Felix A1 - Witt, Andreas A1 - Dafydd, Gibbon A1 - Trippel, Thorsten ED - Lino, Maria Teresa ED - Xavier, Maria Francisca ED - Ferreira, Fátima ED - Costa, Rute ED - Silva, Raquel ED - Pereira, Carla ED - Carvalho, Filipa ED - Lopes, Milene ED - Catarino, Mónica ED - Barros, Sérgio T1 - Concept-based queries: Combining and Reusing Linguistic Corpus Formats and Query Languages T2 - Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC-2004). Lissabon, Portugal N2 - This paper proposes a methodology for querying linguistic data represented in different corpus formats. Examples of the need for queries over such heterogeneous resources are the corpus-based analysis of multimodal phenomena like the interaction of gestures and prosodic features, or syntax-related phenomena like information structure which exceed the expressive power of a tree-centered corpus format. Query languages (QLs) currently under development are strongly connected to corpus formats, like the NITE Object Model (NOM, Carletta et al., 2003) or the Meta-Annotation Infrastructure for ATLAS (MAIA, Laprun and Fiscus, 2002). The parallel development of linguistic query languages and corpus formats is due to the fact that general purpose query languages like XQuery (Boag et al., 2003) do not fulfill the changing needs of linguistically motivated queries, e.g. to give access to (non-)hierarchically organized, theory and language dependent annotations of multi modal signals and/or text. This leads to the problem that existing corpus formats and query languages are hard to reuse. They have to be re developed and re-implemented time-consumingly and expensively for unforeseen tasks. This paper describes an approach for overcoming these problems and a sample application. KW - Query Languages KW - Data Formats KW - Ontology Y1 - 2004 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45434 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45434 UR - http://www.lrec-conf.org/proceedings/lrec2004/ SN - 2-9517408-1-6 SB - 2-9517408-1-6 SP - 259 EP - 262 PB - European Language Resources Association (ELRA) CY - Paris ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Sasaki, Felix A1 - Witt, Andreas A1 - Metzing, Dieter ED - Nivre, Joakim ED - Hinrichs, Erhard T1 - Declarations of Relations, Differences and Transformations between Theory-specific Treebanks: A New Methodology T2 - TLT 2003: Proceedings of the Second Workshop on Treebanks and Linguistic Theories, 14-15 November 2003, Växjö, Sweden N2 - This paper deals with the problem of how to interrelate theory-specific treebanks and how to transform one treebank format to another. Currently, two approaches to achieve these goals can be differentiated. The first creates a mapping algorithm between treebank formats. Categories of a source format are transformed into a target format via a given set of general or language-specific mapping rules. The second relates treebanks via a transformation to a general model of linguistic categories, for example based on the EAGLES recommendations for syntactic annotations of corpora, or relying on the HPSG framework. This paper proposes a new methodology as a solution for these desiderata. T3 - Series mathematical modelling in physics, engineering, and cognitive sciences - 9 KW - Korpus KW - Annotation KW - Methode KW - Treebank Y1 - 2003 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45440 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45440 SN - 978-9176363942 SB - 978-9176363942 SP - 141 EP - 152 PB - Växjö University Press CY - Växjö ER - TY - CHAP U1 - Buchbeitrag A1 - Sasaki, Felix A1 - Witt, Andreas ED - Lobin, Henning ED - Lemnitzer, Lothar T1 - Linguistische Korpora T2 - Texttechnologie. Perspektiven und Anwendungen T3 - Stauffenburg Handbücher - - KW - Korpus Y1 - 2004 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45454 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45454 SN - 3-86057-287-3 SB - 3-86057-287-3 SP - 195 EP - 216 PB - Stauffenburg CY - Tübingen ER - TY - CHAP U1 - Buchbeitrag A1 - Bański, Piotr A1 - Frick, Elena A1 - Hanl, Michael A1 - Kupietz, Marc A1 - Schnober, Carsten A1 - Witt, Andreas ED - Hardie, Andrew ED - Love, Robbie T1 - Robust corpus architecture: a new look at virtual collections and data access T2 - Corpus linguistics 2013. Abstract book KW - Korpus KW - Korpusanalyseplattform (KorAP) KW - Textkorpus Y1 - 2013 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-44855 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-44855 UR - http://ucrel.lancs.ac.uk/cl2013/ SP - 23 EP - 25 PB - UCREL CY - Lancaster ER - TY - CHAP U1 - Buchbeitrag A1 - Belica, Cyril A1 - Kupietz, Marc A1 - Witt, Andreas A1 - Lüngen, Harald ED - Konopka, Marek ED - Kubczak, Jacqueline ED - Mair, Christian ED - Šticha, František ED - Waßner, Ulrich Hermann T1 - The Morphosyntactic Annotation of DeReKo: Interpretation, Opportunities, and Pitfalls T2 - Grammatik und Korpora 2009. Dritte Internationale Konferenz. Mannheim, 22.-24.9.2009 T2 - Grammar & Corpora 2009. Third International Conference. Mannheim, 22.-24.9.2009 N2 - The paper discusses from various angles the morphosyntactic annotation of DeReKo, the Archive of General Reference Corpora of Contemporary Written German at the Institut für Deutsche Sprache (IDS), Mannheim. The paper is divided into two parts. The first part covers the practical and technical aspects of this endeavor. We present results from a recent evaluation of tools for the annotation of German text resources that have been applied to DeReKo. These tools include commercial products, especially Xerox' Finite State Tools and the Machinese products developed by the Finnish company Connexor Oy, as well as software for which academic licenses are available free of charge for academic institutions, e.g. Helmut Schmid's Tree Tagger. The second part focuses on the linguistic interpretability of the corpus annotations and more general methodological considerations concerning scientifically sound empirical linguistic research. The main challenge here is that unlike the texts themselves, the morphosyntactic annotations of DeReKo do not have the status of observed data; instead they constitute a theory and implementation-dependent interpretation. In addition, because of the enormous size of DeReKo, a systematic manual verification of the automatic annotations is not feasible. In consequence, the expected degree of inaccuracy is very high, particularly wherever linguistically challenging phenomena, such as lexical or grammatical variation, are concerned. Given these facts, a researcher using the annotations blindly will run the risk of not actually studying the language but rather the annotation tool or the theory behind it. The paper gives an overview of possible pitfalls and ways to circumvent them and discusses the opportunities offered by using annotations in corpus-based and corpus-driven grammatical research against the background of a scientifically sound methodology. T3 - Korpuslinguistik und interdisziplinäre Perspektiven auf Sprache - 1 KW - Korpus KW - Annotation KW - Schriftsprache KW - Deutsches Referenzkorpus (DeReKo) KW - Institut für Deutsche Sprache Y1 - 2009 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-44890 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-44890 SN - 978-3-8233-6648-5 SB - 978-3-8233-6648-5 SP - 451 EP - 469 PB - Narr CY - Tübingen ER - TY - JOUR U1 - Zeitschriftenartikel, wissenschaftlich - begutachtet (reviewed) A1 - Lehmberg, Timm A1 - Rehm, Georg A1 - Witt, Andreas A1 - Zimmermann, Felix T1 - Digital Text Collections, Linguistic Research Data, and Mashups: Notes on the Legal Situation JF - Library Trends N2 - Comprehensive data repositories are an essential part of practically all research carried out in the digital humanities nowadays. For example, library science, literary studies, and computational and corpus linguistics strongly depend on online archives that are highly sustainable and that contain not only digitized texts but also audio and video data as well as additional information such as metadata and arbitrary annotations. Current Web technologies, especially those that are related to what is commonly referred to as the Web 2.0, provide a number of novel functions such as multiuser editing or the inclusion of third-party content and applications that are also highly attractive for research applications in the areas mentioned above. Hand in hand with this development goes a high degree of legal uncertainty. The special nature of the data entails that, in quite a few cases, there are multiple holders of personal rights (mostly copyright) to different layers of data that often have different origins. This article discusses the legal problems of multiple authorships in private, commercial, and research environments. We also introduce significant differences between European and U.S. law with regard to the handling of this kind of data for scientific purposes. KW - Urheberrecht KW - Datenschutz KW - Digitale Sprachressourcen Y1 - 2008 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45095 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45095 SN - 0024-2594 SS - 0024-2594 VL - 57 IS - 1 SP - 52 EP - 71 PB - Johns Hopkins University Pres CY - Baltimore ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Birkenhake, Benjamin A1 - Witt, Andreas ED - Opas-Hänninen, Lisa Lena ED - Jokelainen, Mikko ED - Juuso, Ilkka ED - Seppänen, Tapio T1 - The German Hamlets: An Advanced Text Technological Application T2 - Digital Humanities 2008. Book of Abstracts KW - Shakespeare, William KW - Hamlet KW - Elektronische Publikation KW - Korpus Y1 - 2008 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45113 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45113 UR - http://www.ekl.oulu.fi/dh2008 SN - 978-951-42-8838-8 SB - 978-951-42-8838-8 SP - 233 EP - 234 PB - European Language Resources Association (ELRA) CY - Paris ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Witt, Andreas A1 - Lüngen, Harald A1 - Gibbon, Dafydd T1 - Enhancing speech corpus resources with multiple lexical tag layers T2 - Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC-2000). Athen, Griechenland N2 - We describe a general two-stage procedure for re-using a custom corpus for spoken language system development involving a transformation from character-based markup to XML, and DSSSL stylesheet-driven XML markup enhancement with multiple lexical tag trees. The procedure was used to generate a fully tagged corpus; alternatively with greater economy of computing resources, it can be employed as a parametrised ‘tagging on demand’ filter. The implementation will shortly be released as a public resource together with the corpus (German spoken dialogue, about 500k word form tokens) and lexicon (about 75k word form types). KW - DSSSL KW - Morphology KW - Speech Corpora KW - Speech Lexica KW - Text Technology KW - XML Y1 - 2000 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45517 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45517 UR - http://lrec-conf.org/proceedings/lrec2000/ SP - 5 S1 - 5 PB - European Language Resources Association (ELRA) CY - Paris ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Sasaki, Felix A1 - Wegener, Claudia A1 - Witt, Andreas A1 - Metzing, Dieter A1 - Pönninghaus, Jens T1 - Co-reference annotation and resources: a multilingual corpus of typologically diverse languages T2 - Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC-2002). Las Palmas, Gran Canaria N2 - This article introduces a dialogue corpus containing data from two typologically different languages, Japanese and Kilivila. The corpus is annotated in accordance with language specific annotation schemes for co-referential and similar relations. The article describes the corpus data, the properties of language specific co-reference in the two languages and a methodology for its annotation. Examples from the corpus show how this methodology is used in the workflow of the annotation process. KW - Coreference KW - Multilingual corpus KW - Multiple annotations KW - Interrelated document grammars Y1 - 2002 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45529 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45529 UR - http://www.lrec-conf.org/proceedings/lrec2002/ SP - 1225 EP - 1230 PB - European Language Resources Association (ELRA) CY - Paris ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Rehm, Georg A1 - Witt, Andreas A1 - Hinrichs, Erhard A1 - Lehmberg, Timm A1 - Chiarcos, Christian A1 - Zimmermann, Felix A1 - Zinsmeister, Heike A1 - Dellert, Johannes ED - Schmidt, Sara ED - Siemens, Ray ED - Kumar, Amit ED - Unsworth, John T1 - Digital Text Resources for the Humanities – Legal Issues T2 - Digital Humanities 2007. Conference Abstracts KW - Digital Humanities KW - Rechtsfrage Y1 - 2007 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45133 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45133 UR - http://www.digitalhumanities.org/dh2007/ SN - 0-87845-125-0 SB - 0-87845-125-0 SP - 161 EP - 162 PB - University of Illinois CY - Urbana-Champaign ET - Second Edition ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Rehm, Georg A1 - Witt, Andreas A1 - Zinsmeister, Heike A1 - Dellert, Johannes ED - Schmidt, Sara ED - Siemens, Ray ED - Kumar, Amit ED - Unsworth, John T1 - Corpus Masking: Legally Bypassing Licensing Restrictions for the Free Distribution of Text Collections T2 - Digital Humanities 2007. Conference Abstracts KW - Korpus KW - Auszeichnungssprache KW - Annotation Y1 - 2007 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45145 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45145 UR - http://www.digitalhumanities.org/dh2007/ SN - 0-87845-125-0 SB - 0-87845-125-0 SP - 166 EP - 170 PB - University of Illinois CY - Urbana-Champaign ET - Second Edition ER - TY - CHAP U1 - Buchbeitrag A1 - Lehmberg, Timm A1 - Chiarcos, Christian A1 - Rehm, Georg A1 - Witt, Andreas ED - Rehm, Georg ED - Witt, Andreas ED - Lemnitzer, Lothar T1 - Rechtsfragen bei der Nutzung und Weitergabe linguistischer Daten T2 - Datenstrukturen für linguistische Ressourcen und ihre Anwendungen. Proceedings of the Biennial GLDV Conference 2007 T2 - Data Structures for Linguistic Resources and Applications. Proceedings of the Biennial GLDV Conference 2007 N2 - Grundlage dieses Artikels* 1 ist das Verbundprojekt „Nachhaltigkeit linguistischer Daten“ der drei Sonderforschungsbereiche 441, 538 und 632, dessen Ziel es ist, Lösungen für die nachhaltige Verfügbarkeit der an den SFBs vorhandenen Korpora zu entwickeln. Ein zentraler Aspekt betrifft die Klärung der Rechtslage für die Nutzung und Weitergabe linguistischer Ressourcen, die durch das Urheber- sowie das Datenschutzrecht geschützt sind. Eine als indifferent wahrgenommene rechtliche Situation wird in der Praxis oft als das entscheidende Hindernis für die Weitergabe linguistischer Daten angeführt. Tatsächlich jedoch sind Nutzung und Weitergabe von Daten zu wissenschaftlichen Zwecken normativ geregelt. Problematisch ist oftmals die Einordnung der speziellen linguistischen Daten als Schutzgegenstand sowie die Tatsache, dass an linguistische Daten und Datensammlungen aufgrund ihrer komplexen und vielschichtigen Beschaffenheit durchaus mehrere Urheber Rechte besitzen können, die sich auf verschiedene Inhalte beziehen. Der Beitrag gibt einen Überblick über das geltende Recht sowie die juristischen und natürlichen Personen, die potentiell Rechte an linguistisch aufbereiteten Datenkollektionen besitzen. Es ist nicht Gegenstand dieses Artikels, rechtsverbindliche Aussagen zu treffen, die auf eine Nutzung und Weitergabe jedweder Daten angewandt werden. Der Artikel orientiert sich in seiner Struktur und thematischen Tiefe bewusst nicht an einem juristischen Publikum, sondern beschreibt die Problematik aus geisteswissenschaftlicher Perspektive. Zusammen mit einem Überblick über das vom Umgang mit linguistischen Datensammlungen betroffene Recht, das Urheberrechtsgesetz (Abschnitt 1) und das Bundesdatenschutzgesetz (Abschnitt 2), wird in den jeweiligen Abschnitten auch eine Klassifikation der Daten aus juristischer Sicht vorgenommen. Anschließend werden Lösungsansätze vorgestellt, die im Rahmen des o. g. Verbundprojektes erarbeitet werden (Abschnitt 3). KW - Urheberrecht KW - Datenschutz KW - Digitale Sprachressourcen Y1 - 2007 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45153 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45153 SN - 978-3-8233-6314-9 SB - 978-3-8233-6314-9 SP - 93 EP - 102 PB - Narr CY - Tübingen ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Lehmberg, Timm A1 - Chiarcos, Christian A1 - Hinrichs, Erhard A1 - Rehm, Georg A1 - Witt, Andreas ED - Schmidt, Sara ED - Siemens, Ray ED - Kumar, Amit ED - Unsworth, John T1 - Collecting Legally Relevant Metadata by Means of a Decision-Tree-Based Questionnaire System T2 - Digital Humanities 2007. Conference Abstracts KW - Korpus KW - Metadaten Y1 - 2007 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45163 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45163 UR - http://www.digitalhumanities.org/dh2007/ SN - 0-87845-125-0 SB - 0-87845-125-0 SP - 164 EP - 166 PB - University of Illinois CY - Urbana-Champaign ET - Second Edition ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Wörner, Kai A1 - Witt, Andreas A1 - Rehm, Georg A1 - Dipper, Stefanie T1 - Modelling Linguistic Data Structures T2 - Proceedings of Extreme Markup Languages 2006 N2 - Linguistic corpora have been annotated by means of SGML-based markup languages for almost 20 years. We can, very roughly, differentiate between three distinct evolutionary stages of markup technologies. (1)Originally, single SGML tree-based document instances were deemed sufficient for the representation of linguistic structures. (2) Linguists began to realize that alternatives and extensions to the traditional model are needed. Formalisms such as, for example, NITE were proposed: the NITE Object Model (NOM) consists of multi-rooted trees. (3) We are now on the threshold of the third evolutionary stage: even NITE's very flexible approach is not suited for all linguistic purposes. As some structures, such as these, cannot be modeled by multi-rooted trees, an even more flexible approach is needed in order to provide a generic annotation format that is able to represent genuinely arbitrary linguistic data structures. KW - Trees/Graphs KW - Modeling KW - Markup Languages Y1 - 2006 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45173 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45173 UR - http://conferences.idealliance.org/extreme/dates.html#2006 SP - 13 S1 - 13 PB - Extreme Markup Languages Conference CY - Montreal ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Schonefeld, Oliver A1 - Witt, Andreas T1 - Towards validation of concurrent markup T2 - Proceedings of Extreme Markup Languages 2006 N2 - XCONCUR allows for the annotations of multiple concurrent hierarchies, but lacks cross-layer validation. This paper explores the requirements for a constraint-based approach for such a validation process. KW - Modeling KW - Validating KW - Concurrent Markup/Overlap Y1 - 2006 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45206 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45206 UR - http://conferences.idealliance.org/extreme/dates.html#2006 SP - 11 S1 - 11 PB - Extreme Markup Languages Conference CY - Montreal ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Goecke, Daniela A1 - Witt, Andreas ED - Hinrichs, Erhard ED - Ide, Nancy ED - Palmer, Martha ED - Pustejovsky, James T1 - Exploiting logical document structure for anaphora resolution T2 - Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC-2006) N2 - The aim of the paper is twofold. Firstly, an approach is presented how to select the correct antecedent for an anaphoric element according to the kind of text segments in which both of them occur. Basically, information on logical text structure (e.g. chapters, sections, paragraphs) is used in order to select the antecedent life span of a linguistic expression, i.e. some linguistic expressions are more likely to be chosen as an antecedent throughout the whole text than others. In addition, an appropriate search scope for an anaphora expressed by an expression can be defined according to the document structuring elements that include the linguistic expression. Corpus investigations give rise to the supposition that logical text structure influences the search scope of candidates for antecedents. Second, a solution is presented how to integrate the resources used for anaphora resolution. In this approach, multi-layered XML annotation is used in order to make a set of resources accessible for the anaphora resolution system. KW - Anapher KW - Korpus KW - Textanalyse Y1 - 2006 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45214 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-45214 UR - http://www.lrec-conf.org/proceedings/lrec2006/ SP - 1077 EP - 1080 PB - European Language Resources Association (ELRA) CY - Paris ER - TY - CHAP U1 - Buchbeitrag A1 - Perkuhn, Rainer ED - Kämper, Heidrun ED - Eichinger, Ludwig M. T1 - "Corpus-driven": Systematische Auswertung automatisch ermittelter sprachlicher Muster T2 - Sprach-Perspektiven. Germanistische Linguistik und das Institut für Deutsche Sprache T3 - Studien zur deutschen Sprache - 40 KW - Korpus KW - Kollokation KW - Distribution KW - Forschungsmethode KW - Deutsches Referenzkorpus (DeReKo) Y1 - 2007 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-47774 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-47774 UR - 978-3-8233-6295-1 SP - 465 EP - 491 PB - Narr CY - Tübingen ER - TY - JOUR U1 - Zeitschriftenartikel, wissenschaftlich - begutachtet (reviewed) A1 - Konopka, Marek T1 - Niedrigfrequente grammatische Phänomene als sprachliche Zweifelsfälle JF - Korpus – Grammatika – Axiologie N2 - Some grammatical phenomena that only seldom appear in the corpora of written language often coincide with Speakers' uncertainty about a given form's grammatical Status. Such display of uncertainty is often subject to prescriptive criticism, which pays little attention to actual usage. However, thorough and discriminating corpus analyses can help in a proper description of various low-frequency phenomena and in situating them more adequately in the grammatical System, against the background of different contexts, communicative situations, and language varieties. To exemplify this potential, this study examines three linguistic phenomena in German, using a corpus-based approach: the dative singular ending -e, the construction aus aller Herren Länder, which lacks the dative plural ending -t and the non-standard preterite form frug. The results can be seen as a contribution to a more precise grammatical description on the one hand and, on the other, as a basis for an improved, more usage-oriented approach in providing practical advice to language users. N2 - Leckteré gramatické jevy, které se v korpusech psaných textu vyskytují jen zřídka, vzbuzují u mnoha, uživatelů jazyka pochyby. Tyto případy jsou často předmětem normativní jazykové kritiky, která se o úzus mnoho nestará. Avšak pečlivé a diferencované korpusové analýzy mohou pomoci popsat i mnohé nízkofrekvenční jevy a adekvátněji je začlenit do systému na pozadí různých kontextu, komunikativních situací a variet K exemplifikaci tohoto stanoviska byly korpusově zkoumány tři jevy německého jazyka: koncovka -e v dativu sg., konstrukce typu aus aller Herren Länder, v níž chybí plu-rálová koncovka -n, a nestandardní préteritální forma firug. Na výsledky tohoto výzkumu lze pohlížet na jedné straně jako na příspěvek k preciznějšímu gramatickému popisu a na straně druhé jako na východisko pro kvalitnější poradenskou praxi, orientovanou na reálný úzus. KW - corpus analysis KW - grammatical description KW - low-frequency linguistic phenomena KW - Deutsch KW - Sprachschwierigkeit KW - Korpus KW - Distribution Y1 - 2010 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-49529 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-49529 SN - 1804-137X SS - 1804-137X VL - 2010 IS - 2 SP - 24 EP - 44 PB - Univerzita CY - Hradec Králové ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Storjohann, Petra ED - Hardie, Andrew ED - Love, Robbie T1 - Lexical, corpus-methodological and lexicographic approaches to paronyms T2 - Proceedings of the 7th International Corpus Linguistics Conference. Abstract Book in Lancaster CL2013 KW - Paronym KW - Korpus KW - Deutsch Y1 - 2013 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-49537 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-49537 UR - http://ucrel.lancs.ac.uk/cl2013/doc/CL2013-ABSTRACT-BOOK.pdf SP - 275 EP - 277 PB - UCREL CY - Lancaster ER - TY - BOOK U1 - Buch A1 - Steyer, Kathrin A1 - Brunner, Annelen T1 - Das UWV-Analysemodell : eine korpusgesteuerte Methode zur linguistischen Systematisierung von Wortverbindungen N2 - Die im Folgenden dargestellte korpusgesteuerte Methode "UWV-Analysemodell" wurde auf der Basis der Forschungen zu usuellen Wortverbindungen (UWV) (vgl. Steyer 2000, 2003, 2004, Steyer/Lauer 2007, Brunner/Steyer 2007, Steyer 2008, Steyer demn.) und zahlreicher, exhaustiver Analysen in den letzten Jahren entwickelt. Ziel war ein empirisches Vorgehensmodell, das es ermöglicht, die Differenziertheit und Vernetztheit von Wortverbindungen auf verschiedenen Abstraktionsebenen ausgehend von Kookkurrenzdaten angemessen darzustellen. Daher ging es in dieser Arbeitsphase nicht darum, usuelle Wortverbindungen des Deutschen möglichst umfassend und in großer Menge zu inventarisieren, sondern die "innere Natur" von Wortverbindungen zwischen Varianz und Invarianz mit unterschiedlichen Graden an lexikalischer Spezifiziertheit sowie ihre wechselseitigen Verbindungen im Detail zu erfassen und zu beschreiben. T3 - OPAL - Online publizierte Arbeiten zur Linguistik - 2009,1 KW - Wortverbindung KW - Syntax KW - Online-Publikation KW - Deutsch Y1 - 2009 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-456 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-456 UR - http://pub.ids-mannheim.de/laufend/opal/pdf/opal2009-1.pdf SN - 1860-9422 SS - 1860-9422 VL - 2009 SP - 41 S1 - 41 PB - Institut für Deutsche Sprache CY - Mannheim ER - TY - CHAP U1 - Buchbeitrag A1 - Lüngen, Harald A1 - Puskás, Csilla A1 - Bärenfänger, Maja A1 - Hilbert, Mirco A1 - Lobin, Henning ED - Pahikkala, Tapio ED - Pyysalo, Sampo ED - Ginter, Filip ED - Salakoski, Tapio T1 - Discourse segmentation of German written texts T2 - Advance in natural language processing. 5th International Conference on NLP FinTAL 2006 Turku, Finnland, August 23-25 N2 - Discourse segmentation is the division of a text into minimal discourse segments, which form the leaves in the trees that are used to represent discourse structures. A definition of elementary discourse segments in German is provided by adapting widely used segmentation principles for English minimal units, while considering punctuation, morphology, sytax, and aspects of the logical document structure of a complex text type, namely scientific articles. The algorithm and implementation of a discourse segmenter based on these principles is presented, as well an evaluation of test runs. KW - Computerlinguistik KW - Diskursanalyse KW - Automatische Sprachanalyse KW - Computational linguistics KW - Discourse annotation KW - Tag KW - Annotation KW - Discourse analysis Y1 - 2006 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-23 SN - 978-3-540-37334-6 SB - 978-3-540-37334-6 U6 - https://dx.doi.org/10.1007/11816508_26 DO - https://dx.doi.org/10.1007/11816508_26 N1 - The final publication is available at Springer via http://dx.doi.org/10.1007/11816508_26 SP - 245 EP - 256 S1 - 12 PB - Springer-Verlag CY - Berlin [u.a.] ER - TY - JOUR U1 - Zeitschriftenartikel, wissenschaftlich - begutachtet (reviewed) A1 - Stührenberg, Maik T1 - The TEI and Current Standards for Structuring Linguistic Data an overview JF - Journal of the Text Encoding Initiative N2 - The TEI has served for many years as a mature annotation format for corpora of different types, including linguistically annotated data. Although it is based on the consensus of a large community, it does not have the legal status of a standard. During the last decade, efforts have been undertaken to develop definitive de jure standards for linguistic data that not only act as a normative basis for the exchange of language corpora but also address recent advancements in technology, such as web-based standards, and the use of large and multiply annotated corpora. In this article we will provide an overview of the process of international standardization and discuss some of the international standards currently being developed under the auspices of ISO/TC 37, a technical committee called “Terminology and other Language and Content Resources”. After that the relationship between the TEI Guidelines and these specifications, according to their formal model, notation format, and annotation model, will be discussed. The conclusion of the paper provides recommendations for dealing with language corpora. KW - Computerlinguistik KW - Korpuslinguistik KW - Standardisierung KW - ISO/TC 37/SC 4 Y1 - 2012 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-2330 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-2330 UR - http://jtei.revues.org/523 IS - 3 SP - 1 EP - 14 S1 - 14 ER - TY - JOUR U1 - Zeitschriftenartikel, wissenschaftlich - begutachtet (reviewed) A1 - Dickgießer, Sylvia T1 - Metadatenschemata in der Datenbank für Gesprochenes Deutsch (DGD 2.0). Unter Mitarbeit von Joachim Gasch KW - Deutsch KW - Gesprochenes Deutsch KW - IDS-Korpora KW - Metadaten KW - Metadatenschemata Y1 - 2011 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-5828 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-5828 SP - 1 EP - 70 S1 - 70 PB - Institut für Deutsche Sprache CY - Mannheim ER - TY - CHAP U1 - Buchbeitrag A1 - Eichinger, Ludwig M. ED - Kallmeyer, Werner ED - Zifonun, Gisela T1 - Linguisten brauchen Korpora und Korpora Linguisten : Wege zu wohl dokumentierten und verlässlichen Aussagen über Sprache T2 - Sprachkorpora : Datenmengen und Erkenntnisfortschritt KW - Deutsch KW - Jahrestagung IDS KW - Sprachkorpus KW - Lexik Y1 - 2007 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-3660 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-3660 SN - 3-11-019273 SB - 3-11-019273 SP - 1 EP - 8 S1 - 8 PB - de Gruyter CY - Berlin ; New York ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Belica, Cyril ED - Abel, Andrea ED - Zanin, Renata T1 - Semantische Nähe als Ähnlichkeit von Kookkurrenzprofilen T2 - Korpora in Lehre und Forschung N2 - Der Beitrag betrachtet lexikalisch-semantische Relationen aus einer emergentistischen Perspektive vor dem Hintergrund eines korpusgeleiteten empirisch-linguistischen Ansatzes. Er skizziert, wie eine systematische Erfassung und Auswertung des Kookkurrenzverhaltens von Lexemen – die Analyse der Ahnlichkeit von Kookkurrenzprofilen mit Hilfe von selbstorganisierenden lexikalischen Merkmalskarten und ihre im Diskurs verankerte Interpretation – wichtige Einblicke in die Struktur verschiedenartiger Verwendungsaspekte dieser Lexeme einschlieslich ihrer semantischen Nahe ermoglichen. Die vorgestellte Methodik wird dabei –uber die explorativ-analytischen Zielsetzungen hinaus – als eine abduktive, auf Theoriebildung zielende Generalisierungsstrategie im postulierten Lexikon-Syntax-Kontinuum verstanden. Zum Schluss werden die Anwendungsmoglichkeiten einiger Komponenten dieser Methodik in der Lexikografie, Lexikologie und Didaktik diskutiert. KW - Korpus KW - semantische Analyse KW - Kookkurrenzanalyse KW - Forschungsmethode Y1 - 2011 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-28361 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-28361 UR - http://www.unibz.it/it/library/Documents/bupress/publications/fulltext/9788860460400.pdf#page=156 SN - 978-88-6046-040-0 SB - 978-88-6046-040-0 SP - 155 EP - 178 PB - Bozen University Press CY - Bozen ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Kupietz, Marc A1 - Belica, Cyril A1 - Keibel, Holger A1 - Witt, Andreas T1 - The german reference corpus DeReKo : a primordial sample for linguistic research T2 - Proceedings of the 7th International Conference on Language Resources and Evaluation : Workshops & Tutorials May 17-18, May 22-23, Main Conference May 19-21, Valletta N2 - ^This paper describes DeReKo (Deutsches Referenzkorpus), the Archive of General Reference Corpora of Contemporary Written German at the Institut für Deutsche Sprache (IDS) in Mannheim, and the rationale behind its development. We discuss its design, its legal background, how to access it, available metadata, linguistic annotation layers, underlying standards, ongoing developments, and aspects of using the archive for empirical linguistic research. The focus of the paper is on the advantages of DEREKO’s design as a primordial sample from which virtual corpora can be drawn for the specific purposes of individual studies. Both concepts, primordial sample and virtual corpus are explained and illustrated in detail. Furthermore, we describe in more detail how DEREKO deals with the fact that all its texts are subject to third parties’ intellectual property rights, and how it deals with the issue of replicability, which is particularly challenging given DEREKO’s dynamic growth and the possibility to construct from it an open number of virtual corpora. KW - Deutsch KW - Textkorpus KW - Korpus KW - Deutsches Referenzkorpus (DeReKo) KW - Institut für Deutsche Sprache Y1 - 2010 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-28379 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-28379 UR - http://www.lrec-conf.org/proceedings/lrec2010/pdf/414_Paper.pdf SN - 2-9517408-6-7 SB - 2-9517408-6-7 SP - 1848 EP - 1854 PB - ELRA CY - Paris ER -