Refine
Year of publication
- 2014 (462) (remove)
Document Type
- Part of a Book (207)
- Article (141)
- Conference Proceeding (52)
- Book (35)
- Part of Periodical (12)
- Working Paper (7)
- Other (6)
- Preprint (2)
Keywords
- Deutsch (149)
- Korpus <Linguistik> (50)
- Institut für Deutsche Sprache <Mannheim> (36)
- Linguistik (29)
- Germanistik (25)
- Computerunterstützte Lexikographie (23)
- Wörterbuch (19)
- Gesprochene Sprache (18)
- Institut für Deutsche Sprache (18)
- Konversationsanalyse (16)
Publicationstate
- Veröffentlichungsversion (173)
- Zweitveröffentlichung (23)
- Postprint (11)
Reviewstate
- (Verlags)-Lektorat (140)
- Peer-Review (64)
- Verlags-Lektorat (7)
- Peer-review (6)
- Review-Status-unbekannt (2)
- (Verlags)Lektorat (1)
- (Verlags-)Lektorat (1)
- Peer-Revied (1)
- Preprint (1)
Publisher
- Institut für Deutsche Sprache (98)
- De Gruyter (88)
- de Gruyter (36)
- Stauffenburg (12)
- European Language Resources Association (ELRA) (11)
- Lang (10)
- Benjamins (6)
- Springer (6)
- Winter (6)
- Cambridge Scholars Publ. (5)
This article presents preliminary results indicating that speakers have a different pitch range when they speak a foreign language compared to the pitch variation that occurs when they speak their native language. To this end, a learner corpus with French and German speakers was analyzed. Results suggest that speakers indeed produce a smaller pitch range in the respective L2. This is true for both groups of native speakers. A possible explanation for this finding is that speakers are less confident in their productions, therefore, they concentrate more on segments and words and subsequently refrain from realizing pitch range more native-like. For language teaching, the results suggest that learners should be trained extensively on the more pronounced use of pitch in the foreign language.
Das Konzept,Textgrammatik' wird einer kritischen Prüfung unterzogen. Die Hypothese, für die argumentiert wird, ist, dass eine strikte Auslegung im Sinne der Annahme, Texte hätten eine spezifische Grammatik, wie Sätze eine spezifische Grammatik haben, nicht aufrecht erhalten werden kann. Grundlegende Eigenschaften, nämlich die Existenz eines hierarchisch aufgebauten Regelsystems, eine spezifische Form von Gegliedertheit und Formbezogenheit, sind anders als auf Satzebene beim Text nicht gegeben. Exemplarisch werden die Phänomene Anaphorik sowie, ausführlicher, Erscheinungsformen der Ellipse bzw. aus dem elliptischen Formenkreis diskutiert. Das Fazit ist: ,Textgrammatik‘ sollte - wenn überhaupt gebraucht - nur als Verweis auf die Textsensibilität der Satzgrammatik dienen.
Der Blick zurück nach vorn
(2014)
Topologisches Satzmodell
(2014)
We start by trying to answer a question that has already been asked by de Schryver et al. (2006): Do dictionary users (frequently) look up words that are frequent in a corpus. Contrary to their results, our results that are based on the analysis of log files from two different online dictionaries indicate that users indeed look up frequent words frequently. When combining frequency information from the Mannheim German Reference Corpus and information about the number of visits in the Digital Dictionary of the German Language as well as the German language edition of Wiktionary, a clear connection between corpus and look-up frequencies can be observed. In a follow-up study, we show that another important factor for the look-up frequency of a word is its temporal social relevance. To make this effect visible, we propose a de-trending method where we control both frequency effects and overall look-up trends.
Gegenstand des Aufsatzes sind Sätze mit so genannten inneren Objekten, das sind Akkusativobjekte, die im Wesentlichen intransitive Verben gelegentlich zu sich nehmen. Sie weisen die Besonderheit auf, dass das Objektsnomen und das Verb morphologisch, etymologisch und/oder semantisch miteinander verwandt sind. Aufgrund von Form- und vor allem Bedeutungsunterschieden lassen sich in beiden Sprachen verschiedene Gruppen von inneren Objekten ausmachen, die genauer beschrieben und unter sprachvergleichenden Gesichtspunkten betrachtet werden. Dazu werden u.a. die syntaktischen Eigenschaften von Sätzen mit inneren Objekten herangezogen. Einige auffallende sprachbezogene Unterschiede werden beschrieben, beispielsweise ist im Rumänischen bei einigen Verben ein präpositionaler Anschluss möglich, wo im Deutschen das innere Objekt ausschließlich im Akkusativ stehen kann. Sätze mit inneren Objekten können als ein Typ von Argumentstrukturmustern betrachtet werden. In diesem Sinne sind sie Form-Bedeutungs-Paare, deren Beziehungen untereinander innerhalb eines Konzepts von Familienähnlichkeiten dargestellt werden, wie man sie auch innerhalb anderer Cluster von Argumentstrukturmustern beobachten kann.
Ablautreihe
(2014)
Ablaut
(2014)
Automatic Food Categorization from Large Unlabeled Corpora and Its Impact on Relation Extraction
(2014)
We present a weakly-supervised induction method to assign semantic information to food items. We consider two tasks of categorizations being food-type classification and the distinction of whether a food item is composite or not. The categorizations are induced by a graph-based algorithm applied on a large unlabeled domain-specific corpus. We show that the usage of a domain-specific corpus is vital. We do not only outperform a manually designed open-domain ontology but also prove the usefulness of these categorizations in relation extraction, outperforming state-of-the-art features that include syntactic information and Brown clustering.
We examine the task of separating types from brands in the food domain. Framing the problem as a ranking task, we convert simple textual features extracted from a domain-specific corpus into a ranker without the need of labeled training data. Such method should rank brands (e.g. sprite) higher than types (e.g. lemonade). Apart from that, we also exploit knowledge induced by semi-supervised graph-based clustering for two different purposes. On the one hand, we produce an auxiliary categorization of food items according to the Food Guide Pyramid, and assume that a food item is a type when it belongs to a category unlikely to contain brands. On the other hand, we directly model the task of brand detection using seeds provided by the output of the textual ranking features. We also harness Wikipedia articles as an additional knowledge source.
We report on the two systems we built for Task 1 of the German Sentiment Analysis Shared Task, the task on Source, Subjective Expression and Target Extraction from Political Speeches (STEPS). The first system is a rule-based system relying on a predicate lexicon specifying extraction rules for verbs, nouns and adjectives, while the second is a translation-based system that has been obtained with the help of the (English) MPQA corpus.
Part-of-speech tagging (POS-tagging) of spoken data requires different means of annotation than POS-tagging of written and edited texts. In order to capture the features of German spoken language, a distinct tagset is needed to respond to the kinds of elements which only occur in speech. In order to create such a coherent tagset the most prominent phenomena of spoken language need to be analyzed, especially with respect to how they differ from written language. First evaluations have shown that the most prominent cause (over 50%) of errors in the existing automatized POS-tagging of transcripts of spoken German with the Stuttgart Tübingen Tagset (STTS) and the treetagger was the inaccurate interpretation of speech particles. One reason for this is that this class of words is virtually absent from the current STTS. This paper proposes a recategorization of the STTS in the field of speech particles based on distributional factors rather than semantics. The ultimate aim is to create a comprehensive reference corpus of spoken German data for the global research community. It is imperative that all phenomena are reliably recorded in future part-of-speech tag labels.