410 Linguistik
Refine
Document Type
- Part of a Book (21)
- Article (15)
- Conference Proceeding (11)
- Book (6)
- Working Paper (4)
- Other (3)
- Doctoral Thesis (1)
- Master's Thesis (1)
- Preprint (1)
Is part of the Bibliography
- yes (63) (remove)
Keywords
- Korpus <Linguistik> (22)
- Deutsch (9)
- Computerlinguistik (6)
- Annotation (5)
- Wortschatz (5)
- Arzt (4)
- Automatische Sprachanalyse (4)
- Gesprochene Sprache (4)
- Interaktion (4)
- Internet (4)
Publicationstate
- Veröffentlichungsversion (29)
- Postprint (4)
- Preprint (2)
Reviewstate
Publisher
- De Gruyter (10)
- Institut für Deutsche Sprache (8)
- de Gruyter (6)
- European Language Resources Association (ELRA) (3)
- German Society for Computational Linguistics & Language Technology (GSCL) (2)
- Gesellschaft für Sprachtechnologie and Computerlinguistik e.V. (2)
- Linköping University Electronic Press, Linköpings universitet (2)
- Springer International Publishing (2)
- Association for Computational Linguistics (1)
- Dictionary Society of North America (1)
In this paper, we describe preliminary results from an ongoing experiment wherein we classify two large unstructured text corpora—a web corpus and a newspaper corpus—by topic domain (or subject area). Our primary goal is to develop a method that allows for the reliable annotation of large crawled web corpora with meta data required by many corpus linguists. We are especially interested in designing an annotation scheme whose categories are both intuitively interpretable by linguists and firmly rooted in the distribution of lexical material in the documents. Since we use data from a web corpus and a more traditional corpus, we also contribute to the important field of corpus comparison and corpus evaluation. Technically, we use (unsupervised) topic modeling to automatically induce topic distributions over gold standard corpora that were manually annotated for 13 coarse-grained topic domains. In a second step, we apply supervised machine learning to learn the manually annotated topic domains using the previously induced topics as features. We achieve around 70% accuracy in 10-fold cross validations. An analysis of the errors clearly indicates, however, that a revised classification scheme and larger gold standard corpora will likely lead to a substantial increase in accuracy.
Linguistische Zugänge zu Konflikten in europäischen Sprachräumen. Korpus - Pragmatik - kontrovers
(2016)
Konflikte begleiten das soziale Leben in unserer Gesellschaft: Vom Gartenzaun bis in die politischen Arenen, vom Alltag bis hin zu Fragen der transnationalen Verrechtlichung in der Europäischen Union – überall begegnen uns tagtäglich Auseinandersetzungen. Konflikte und Sprache hängen dabei eng miteinander zusammen. Zum einen wird in Sprache über Sprache verhandelt, zum anderen ist Sprache das Medium des Streitens und Versöhnens schlechthin. Konflikte werden vor allem durch Sprache vermittelt, d.h. Sprach(en)konflikte sind Spiegel soziokultureller Auseinandersetzungen um Wissen und Macht.
Der Band bietet einen umfassenden Einblick in die kontroverse Diskussion und Weiterentwicklung aktueller linguistischer Forschung zur Untersuchung von Konflikten. Gerade in Zeiten von gesellschaftlichen Krisen können sprachwissenschaftliche Ansätze dazu beitragen, Konflikte als sozialsymbolische Handlungsmuster zu analysieren und ihre kommunikativen Zusammenhänge zu beschreiben.
On the basis of a law text corpus which consists of judicial decisions and jurisprudential papers on so-called assisted suicide from 1977 to 2011, agonal centres are determined within the paradigm of corpus-based pragma-semiotic text analysis. Agonal centres are defined as action-guiding concepts that are in conflict with each other concerning the general acceptance of event interpretations, options for actions, claims of validity, contextual knowledge and values. These action-guiding concepts are derived with the help of quantitative and qualitative methods. Discourse linguistic interpretations are thus rendered more objective with the help of semi-automatic methods; furthermore, specific discourse features of the discourse and approaches to interpretation can be derived from (un)expected linguistic significances of occurrence, distribution, frequency etc. at the linguistic surface. Finally, these agonal centres specific to the language of law are compared to agonal centres which are determined on the basis of a media corpus on the same issue. This provides a comparative insight into the constitution of a seemingly identical fact in everyday and special language, which demonstrates the sociopolitical relevance of analysing the constitution of reality as instructed by language.
This contribution presents the background, design and results of a study of users of three oral corpus platforms in Germany. Roughly 5.000 registered users of the Database for Spoken German (DGD), the GeWiss corpus and the corpora of the Hamburg Centre for Language Corpora (HZSK) were asked to participate in a user survey. This quantitative approach was complemented by qualitative interviews with selected users. We briefly introduce the corpus resources involved in the study in section 2. Section 3 describes the methods employed in the user studies. Section 4 summarizes results of the studies focusing on selected key topics. Section 5 attempts a generalization of these results to larger contexts.
Tagset und Richtlinie für das PoSTagging von Sprachdaten aus Genres internetbasierter Kommunikation
(2015)
Sense relations
(2016)
In order to develop its full potential, global communication needs linguistic support systems such as Machine Translation (MT). In the past decade, free online MT tools have become available to the general public, and the quality of their output is increasing. However, the use of such tools may entail various legal implications, especially as far as processing of personal data is concerned. This is even more evident if we take into account that their business model is largely based on providing translation in exchange for data, which can subsequently be used to improve the translation model, but also for commercial purposes. The purpose of this paper is to examine how free online MT tools fit in the European data protection framework, harmonised by the EU Data Protection Directive. The perspectives of both the user and the MT service provider are taken into account.