OPUS 4 | Search

51 search hits

1 to 10

Sort by

Year
Year
Title
Title
Author
Author

The Karl Eberhards Corpus of spontaneously spoken southern German in dialogues - audio and articulatory recordings (2016)

Arnold, Denis ; Tomaschek, Fabian

The current paper presents a corpus containing 35 dialogues of spontaneously spoken southern German, including half an hour of articulography for 13 of the speakers. Speakers were seated in separate recording chambers, mimicking a telephone call, and recorded on individual audio channels. The corpus provides manually corrected word boundaries and automatically aligned segment boundaries. Annotations are provided in the Praat format. In addition to audio recordings, speakers filled out a detailed questionnaire, assessing among others their audio-visual consumption habits.

Corpus Query Lingua Franca (CQLF) (2016)

Bański, Piotr ; Frick, Elena ; Witt, Andreas

The present paper describes Corpus Query Lingua Franca (ISO CQLF), a specification designed at ISO Technical Committee 37 Subcommittee 4 “Language resource management” for the purpose of facilitating the comparison of properties of corpus query languages. We overview the motivation for this endeavour and present its aims and its general architecture. CQLF is intended as a multi-part specification; here, we concentrate on the basic metamodel that provides a frame that the other parts fit in.

Integrating corpora of computer-mediated communication into the language resources landscape: Initiatives and best practices from French, German, Italian and Slovenian projects (2016)

Beißwenger, Michael ; Chanier, Thierry ; Chiari, Isabella ; Erjavec, Tomaž ; Fišer, Darja ; Herold, Axel ; Ljubešić, Nikola ; Lüngen, Harald ; Poudat, Céline ; Stemle, Egon W. ; Storrer, Angelika ; Wigham, Ciara

The paper presents best practices and results from projects in four countries dedicated to the creation of corpora of computer-mediated communication and social media interactions (CMC). Even though there are still many open issues related to building and annotating corpora of that type, there already exists a range of accessible solutions which have been tested in projects and which may serve as a starting point for a more precise discussion of how future standards for CMC corpora may (and should) be shaped like.

Integrating corpora of computer-mediated communication into the language resources landscape: Initiatives and best practices from French, German, Italian and Slovenian projects (2016)

Converting and Representing Social Media Corpora into TEI: Schema and best practices from CLARIN-D (2016)

Beißwenger, Michael ; Ehrhardt, Eric ; Herold, Axel ; Lüngen, Harald ; Storrer, Angelika

The paper presents results from a curation project within CLARIN-D, in which an existing lMWord corpus of German chat communication has been integrated into the DEREKO and DWDS corpus infrastructures of the CLARIN-D centres at the Institute for the German Language (IDS, Mannheim) and at the Berlin-Brandenburg Academy of Sciences (BBAW, Berlin). The focus is on the solutions developed for converting and representing the corpus in a TEI format.

(Best) Practices for Annotating and Representing CMC and Social Media Corpora in CLARIN-D (2016)

Beißwenger, Michael ; Ehrhardt, Eric ; Herold, Axel ; Lüngen, Harald ; Storrer, Angelika

The paper reports the results of the curation project ChatCorpus2CLARIN. The goal of the project was to develop a workflow and resources for the integration of an existing chat corpus into the CLARIN-D research infrastructure for language resources and tools in the Humanities and the Social Sciences (http://clarin-d.de). The paper presents an overview of the resources and practices developed in the project, describes the added value of the resource after its integration and discusses, as an outlook, to what extent these practices can be considered best practices which may be useful for the annotation and representation of other CMC and social media corpora.

Das Dortmunder Chat-Korpus in CLARIN-D: Modellierung und Mehrwerte (2016)

Beißwenger, Michael ; Herold, Axel ; Lüngen, Harald ; Storrer, Angelika

A comparison between morphological complexity measures: typological data vs. language corpora (2016)

Bentz, Christian ; Soldatova, Tatjana ; Koplenig, Alexander ; Samardžić, Tanja

Language complexity is an intriguing phenomenon argued to play an important role in both language learning and processing. The need to compare languages with regard to their complexity resulted in a multitude of approaches and methods, ranging from accounts targeting specific structural features to global quantification of variation more generally. In this paper, we investigate the degree to which morphological complexity measures are mutually correlated in a sample of more than 500 languages of 101 language families. We use human expert judgements from the World Atlas of Language Structures (WALS), and compare them to four quantitative measures automatically calculated from language corpora. These consist of three previously defined corpus-derived measures, which are all monolingual, and one new measure based on automatic word-alignment across pairs of languages. We find strong correlations between all the measures, illustrating that both expert judgements and automated approaches converge to similar complexity ratings, and can be used interchangeably.

Proceedings of the Semantics of African, Asian and Austronesian Languages (TripleA) 2 (2016)

Bowler, Margit ; Hsieh, I-Ta Chris ; Shen, Zheng ; Korat, Omer ; Tran, Thuan

TripleA is a workshop series founded by linguists from the University of Tübingen and the University of Potsdam. Its aim is to provide a forum for semanticists doing fieldwork on understudied languages, and its focus is on languages from Africa, Asia, Australia and Oceania. The second TripleA workshop was held at the University of Potsdam, June 3-5, 2015.

DRuKoLA – towards contrastive German-Romanian research based on comparable corpora (2016)

Cosma, Ruxandra ; Cristea, Dan ; Kupietz, Marc ; Tufiş, Dan ; Witt, Andreas

This paper introduces the recently started DRuKoLA-project that aims at providing mechanisms to flexibly draw virtual comparable corpora from the German Reference Corpus DeReKo and the Reference Corpus of Contemporary Romanian Language CoRoLa in order to use these virtual corpora as empirical basis for contrastive linguistic research.

1 to 10

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

51 search hits