Refine
Year of publication
- 2017 (100) (remove)
Document Type
- Article (42)
- Part of a Book (27)
- Conference Proceeding (19)
- Book (4)
- Working Paper (4)
- Report (2)
- Other (1)
- Part of Periodical (1)
Has Fulltext
- yes (100) (remove)
Is part of the Bibliography
- yes (100) (remove)
Keywords
- Deutsch (35)
- Korpus <Linguistik> (28)
- Gesprochene Sprache (16)
- Diskursmarker (7)
- Annotation (5)
- Computerlinguistik (5)
- Computerunterstützte Lexikographie (5)
- Online-Wörterbuch (5)
- Computerunterstützte Kommunikation (4)
- Grammatik (4)
Publicationstate
- Veröffentlichungsversion (100) (remove)
Reviewstate
- Peer-Review (52)
- (Verlags)-Lektorat (38)
- Peer-review (7)
Publisher
- Institut für Deutsche Sprache (17)
- de Gruyter (13)
- Verlag für Gesprächsforschung (10)
- Lexical Computing CZ s.r.o. (5)
- Heidelberg University Publishing (4)
- The Association for Computational Linguistics (4)
- Narr Francke Attempto (3)
- De Gruyter (2)
- Erich Schmidt (2)
- Linköping University Electronic Press (2)
The possibilities of re-use and archiving of spoken and written corpora are affected by personality rights (depending on legal tradition also called: the right of publicity), copyright law and data protection / privacy laws. These recommendations include information about legal aspects which should be considered while creating corpora to ensure the greatest archivability and re-usability possible in compliance with current laws.
The information compiled here shall serve researchers who plan to create corpora or who are involved in evaluation of such measures as a guideline. This information is not exhaustive or to be considered as legal advice. Researchers should consult institutional legal departments and management before making legally relevant decisions. That said, further legal expertise should be sought if possible as early as project planning phases.
Die folgenden Ausführungen zur Maskierung basieren auf den Erfahrungen bei der Aufbereitung der Daten des Forschungs- und Lehrkorpus Gesprochenes Deutsch (FOLK) für die Veröffentlichung in der Datenbank für Gesprochenes Deutsch (DGD). Sie sollen anderen Forschern und Forschungsprojekten als praktische Hilfestellung für die Maskierung von Aufnahmen dienen, können aber selbstverständlich nicht die gesamte Bandbreite von Einzelfallentscheidungen und Pflichten der Forschenden abdecken.
Es werden sowohl allgemeine Hinweise zur Maskierung von Audio- und Videoaufnahmen gegeben als auch praktische Tipps zur Umsetzung der Maskierung mit dem Transkriptionseditor FOLKER.
Die in den Ausführungen geschilderten arbeitsteiligen Prozesse in größeren Projekten können in kleineren Projekten einzelner Forscher selbstverständlich auch von einer einzelnen Person ausgeführt werden.
Unknown words are a challenge for any NLP task, including sentiment analysis. Here, we evaluate the extent to which sentiment polarity of complex words can be predicted based on their morphological make-up. We do this on German as it has very productive processes of derivation and compounding and many German hapax words, which are likely to bear sentiment, are morphologically complex. We present results of supervised classification experiments on new datasets with morphological parses and polarity annotations.
We present a major step towards the creation of the first high-coverage lexicon of polarity shifters. In this work, we bootstrap a lexicon of verbs by exploiting various linguistic features. Polarity shifters, such as ‘abandon’, are similar to negations (e.g. ‘not’) in that they move the polarity of a phrase towards its inverse, as in ‘abandon all hope’. While there exist lists of negation words, creating comprehensive lists of polarity shifters is far more challenging due to their sheer number. On a sample of manually annotated verbs we examine a variety of linguistic features for this task. Then we build a supervised classifier to increase coverage. We show that this approach drastically reduces the annotation effort while ensuring a high-precision lexicon. We also show that our acquired knowledge of verbal polarity shifters improves phrase-level sentiment analysis.
We use a convolutional neural network to perform authorship identification on a very homogeneous dataset of scientific publications. In order to investigate the effect of domain biases, we obscure words below a certain frequency threshold, retaining only their POS-tags. This procedure improves test performance due to better generalization on unseen data. Using our method, we are able to predict the authors of scientific publications in the same discipline at levels well above chance.
We propose a new type of subword embedding designed to provide more information about unknown compounds, a major source for OOV words in German. We present an extrinsic evaluation where we use the compound embeddings as input to a neural dependency parser and compare the results to the ones obtained with other types of embeddings. Our evaluation shows that adding compound embeddings yields a significant improvement of 2% LAS over using word embeddings when no POS information is available. When adding POS embeddings to the input, however, the effect levels out. This suggests that it is not the missing information about the semantics of the unknown words that causes problems for parsing German, but the lack of morphological information for unknown words. To augment our evaluation, we also test the new embeddings in a language modelling task that requires both syntactic and semantic information.
Universal Dependency (UD) annotations, despite their usefulness for cross-lingual tasks and semantic applications, are not optimised for statistical parsing. In the paper, we ask what exactly causes the decrease in parsing accuracy when training a parser on UD-style annotations and whether the effect is similarly strong for all languages. We conduct a series of experiments where we systematically modify individual annotation decisions taken in the UD scheme and show that this results in an increased accuracy for most, but not for all languages. We show that the encoding in the UD scheme, in particular the decision to encode content words as heads, causes an increase in dependency length for nearly all treebanks and an increase in arc direction entropy for many languages, and evaluate the effect this has on parsing accuracy.
To improve grammatical function labelling for German, we augment the labelling component of a neural dependency parser with a decision history. We present different ways to encode the history, using different LSTM architectures, and show that our models yield significant improvements, resulting in a LAS for German that is close to the best result from the SPMRL 2014 shared task (without the reranker).
We introduce a method for error detection in automatically annotated text, aimed at supporting the creation of high-quality language resources at affordable cost. Our method combines an unsupervised generative model with human supervision from active learning. We test our approach on in-domain and out-of-domain data in two languages, in AL simulations and in a real world setting. For all settings, the results show that our method is able to detect annotation errors with high precision and high recall.
Deutschland sieht sich in den nächsten Jahren vor enormen Herausforderungen gegen-übergestellt. Mit der Fluchtmigration von knapp 1,5 Mio. Menschen alleine zwischen 2014 und 2017 stehen nahezu in jedem gesellschaftlichen Bereich und hier insbesonde-re in den Sektoren Bildung und Arbeit große Integrationsaufgaben an. Steven Vertovec, der Leiter des Max-Planck-Instituts zur Erforschung multireligiöser und multiethnischer Gesellschaften bezeichnet die Fluchtmigration von 2015 auch deshalb als die „zweite Wende“ (Vertovec 2015) für Deutschland, die das Land nachhaltig verändern wird. Nach seiner Einschätzung werden die gesellschaftlichen Transformationen dermaßen tiefgrei-fend sein, dass die Formulierung „seit der Flüchtlingskrise“ eine ebenso geläufige Rede-wendung sein wird wie die Formulierung „seit der Wende“.
Berufliche Qualifizierungsmaßnahmen wie „GASTRO“ im Rhein-Neckar-Raum sind in diesem Kontext sehr wichtige Anstrengungen im Hinblick auf die strukturelle Integrati-on der Fluchtmigranten. Im gesamtgesellschaftlichen Kontext sind sie unverzichtbare Bestandteile der neuen Willkommenskultur, die seit den 2010ern versucht wird, in Deutschland zu etablieren. Als Einwanderungsland kann Deutschland mit Hilfe solcher Initiativen gezielt für Arbeitsbereiche und Berufsgruppen werben, in denen es an Nach-wuchs mangelt. Den Neuzugewanderten bieten sie die Chance sich in der hiesigen Ar-beitswelt zu orientieren und möglicherweise Berufsfelder zu erkunden, die ihnen bis-lang noch nicht oder nur in anderer Form bekannt waren.