Refine
Year of publication
- 2014 (462) (remove)
Document Type
- Part of a Book (207)
- Article (141)
- Conference Proceeding (52)
- Book (35)
- Part of Periodical (12)
- Working Paper (7)
- Other (6)
- Preprint (2)
Keywords
- Deutsch (149)
- Korpus <Linguistik> (50)
- Institut für Deutsche Sprache <Mannheim> (36)
- Linguistik (29)
- Germanistik (25)
- Computerunterstützte Lexikographie (23)
- Wörterbuch (19)
- Gesprochene Sprache (18)
- Institut für Deutsche Sprache (18)
- Konversationsanalyse (16)
Publicationstate
- Veröffentlichungsversion (173)
- Zweitveröffentlichung (23)
- Postprint (11)
Reviewstate
- (Verlags)-Lektorat (140)
- Peer-Review (64)
- Verlags-Lektorat (7)
- Peer-review (6)
- Review-Status-unbekannt (2)
- (Verlags)Lektorat (1)
- (Verlags-)Lektorat (1)
- Peer-Revied (1)
- Preprint (1)
Publisher
- Institut für Deutsche Sprache (98)
- De Gruyter (88)
- de Gruyter (36)
- Stauffenburg (12)
- European Language Resources Association (ELRA) (11)
- Lang (10)
- Benjamins (6)
- Springer (6)
- Winter (6)
- Cambridge Scholars Publ. (5)
"Badeölgrüne Buchten", "kükengelbes Haar" und "tomatenrote Tomaten" - Vergleiche mit Farbadjektiven
(2014)
Speakers’ dialogical orientation to the particular others they talk to is implemented by practices of recipient-design. One such practice is the use of negation as a means to constrain interpretations of speaker’s actions by the partner. The paper situates this use of negation within the larger context of other recipient-designed uses of negation which negate assumptions the speaker makes about what the addressee holds to be true (second-order assumptions) or what the addressee assumes the speaker holds to be true (third- order assumptions). The focus of the study is on the ways in which speakers use negation to disclaim interpretations of their turns which partners have displayed or may possibly arrive at. Special emphasis is given to the positionally sensitive uses of negation, which may occur before, after or inserted between the nucleus actions whose interpretation is constrained by the negation. Interactional motivations and rhetorical potentials of the practice are pointed out, partly depending on the position of the negation vis-à-vis the nucleus action. The analysis shows that the concept of ‘recipient design’ is in need of distinctions which have not been in focus in prior research.
50 Jahre IDS
(2014)
So far, there have been few descriptions on creating structures capable of storing lexicographic data, ISO 24613:2008 being one of the latest. Another one is by Spohr (2012), who designs a multifunctional lexical resource which is able to store data of different types of dictionaries in a user-oriented way. Technically, his design is based on the principle of a hierarchical XML/OWL (eXtensible Markup Language/Web Ontology Language) representation model. This article follows another route in describing a model based on entities and relations between them; MySQL (usually referred to as: Structured Query Language) describes a database system of tables containing data and definitions of relations between them. The model was developed in the context of the project "Scientific eLexicography for Africa" and the lexicographic database to be built thereof will be implemented with MySQL. The principles of the ISO model and of Spohr's model are adhered to with one major difference in the implementation strategy: we do not place the lemma in the centre of attention, but the sense description — all other elements, including the lemma, depend on the sense description. This article also describes the contained lexicographic data sets and how they have been collected from different sources. As our aim is to compile several prototypical internet dictionaries (a monolingual Northern Sotho dictionary, a bilingual learners' Xhosa–English dictionary and a bilingual Zulu–English dictionary), we describe the necessary microstructural elements for each of them and which principles we adhere to when designing different ways of accessing them. We plan to make the model and the (empty) database with all graphical user interfaces that have been developed, freely available by mid-2015.
Large classes at universities(> 1600 students) create their own challenges for teaching and learning. Audience feedback is lacking and fine tuning of lectures, courses and exam preparation to address individual needs is very difficult to achieve. At RWTH Aachen University, a course concept and a knowledge map learning tool aimed to support individual students to prepare for exams in information science through theme-based exercises were developed and evaluated. The tool was grounded in the notion of self-regul ated learning with the goal of enabling students to learn
independently.
Ablaut
(2014)
Ablautreihe
(2014)
We present an approach to an aspect of managing complex access scenarios to large and heterogeneous corpora that involves handling user queries that, intentionally or due to the complexity of the queried resource, target texts or annotations outside of the given user’s permissions. We first outline the overall architecture of the corpus analysis platform KorAP, devoting some attention to the way in which it handles multiple query languages, by implementing ISO CQLF (Corpus Query Lingua Franca), which in turn constitutes a component crucial for the functionality discussed here. Next, we look at query rewriting as it is used by KorAP and zoom in on one kind of this procedure, namely the rewriting of queries that is forced by data access restrictions.
This paper introduces the Aix Map Task corpus, a corpus of audio and video recordings of task-oriented dialogues. It was modelled after the original HCRC Map Task corpus. Lexical material was designed for the analysis of speech and prosody, as described in Astésano et al. (2007). The design of the lexical material, the protocol and some basic quantitative features of the existing corpus are presented. The corpus was collected under two communicative conditions, one audio-only condition and one face-to-face condition. The recordings took place in a studio and a sound attenuated booth respectively, with head-set microphones (and in the face-to-face condition with two video cameras). The recordings have been segmented into Inter-Pausal-Units and transcribed using transcription conventions containing actual productions and canonical forms of what was said. It is made publicly available online.
This contribution presents an XML Schema for annotating a high level narratological category: speech, thought and writing representation (ST&WR). It focusses on two aspects: Firstly, the original Schema is presented as an example for the challenge to encode a narrative feature in a structured and flexible way and secondly, ways of adapting this Schema to TEI are considered, in Order to make it usable for other, TEI-based projects.
Cette contribution s’intéresse aux co-constructions d’un tour de parole en interaction, plus spécifiquement, à la manière dont la complétion d’un énoncé de la part d’un co-participant est ensuite réceptionnée par le locuteur dont le tour a été complété. Malgré l’intérêt certain porté par l’analyse conversationnelle et la linguistique interactionnelle à la co-énonciation, l’évaluation de cette pratique par le premier locuteur n’a pas fait l’objet d’analyses approfondies. Dans ce qui suit, nous nous focalisons plus particulièrement sur les pratiques interactionnelles qui permettent aux participants de valider une co-construction. Ce travail est issu du projet ANR SPIM (« L’imitation dans la parole »), dans le cadre duquel nous nous sommes interrogée sur la fonction de l’hétéro-répétition (le fait de répéter un énoncé d’un autre locuteur ou une partie de celui-ci, opposée à l’auto- répétition) dans des séquences de co-construction d’un tour de parole.
Anaphora
(2014)
Annotating Spoken Language
(2014)
We continue the study of the reproducibility of Propp’s annotations from Bod et al. (2012). We present four experiments in which test subjects were taught Propp’s annotation system; we conclude that Propp’s system needs a significant amount of training, but that with sufficient time investment, it can be reliably trained for simple tales.
Antonomasie
(2014)
Assonanz
(2014)
Asyndeton
(2014)
Attizismus
(2014)
Automatic Food Categorization from Large Unlabeled Corpora and Its Impact on Relation Extraction
(2014)
We present a weakly-supervised induction method to assign semantic information to food items. We consider two tasks of categorizations being food-type classification and the distinction of whether a food item is composite or not. The categorizations are induced by a graph-based algorithm applied on a large unlabeled domain-specific corpus. We show that the usage of a domain-specific corpus is vital. We do not only outperform a manually designed open-domain ontology but also prove the usefulness of these categorizations in relation extraction, outperforming state-of-the-art features that include syntactic information and Brown clustering.
Feminine forms of job titles raise great interest in many countries. However, it is still unknown how they shape stereotypical impressions on warmth and competence dimensions among female and male listeners. In an experiment with fictitious job titles men perceived women described with feminine job titles as significantly less warm and marginally less competent than women with masculine job titles, which led to lower willingness to employ them. No such effects were observed among women.
Barbarismus
(2014)
In this paper, we present the concept and the results of two studies addressing (potential) users of monolingual German online dictionaries, such as www.elexiko.de. Drawing on the example of elexiko, the aim of those studies was to collect empirical data on possible extensions of the content of monolingual online dictionaries, e.g. the search function, to evaluate how users comprehend the terminology of the user interface, to find out which types of information are expected to be included in each specific lexicographic module and to investigate general questions regarding the function and reception of examples illustrating the use of a word. The design and distribution of the surveys is comparable to the studies described in the chapters 5-8 of this volume. We also explain, how the data obtained in our studies were used for further improvement of the elexiko-dictionary.
Bezugsnomen
(2014)
Brachylogie
(2014)
Wikipedia is a valuable resource, useful as a lingustic corpus or a dataset for many kinds of research. We built corpora from Wikipedia articles and talk pages in the I5 format, a TEI customisation used in the German Reference Corpus (Deutsches Referenzkorpus - DeReKo). Our approach is a two-stage conversion combining parsing using the Sweble parser, and transformation using XSLT stylesheets. The conversion approach is able to successfully generate rich and valid corpora regardless of languages. We also introduce a method to segment user contributions in talk pages into postings.
By way of migration, large numbers of German-speaking settlers arrived in Pennsylvania between roughly 1700 and 1750. Pennsylvania German, as a distinct variety, developed through levelling processes from L1 varieties of these migrants who came mainly from the southwestern regions of the German speaking area. Pennsylvania German is still spoken today by specific religious groups (primarily Amish and Menonnite groups) for many of whom it is an identity marker. My paper focuses on those Pennsylvania Germans who are not part of these religious groups but have the same migration history. Due to their being closer to the cultural values of American mainstream society, they were integrated into it, and during the 20th century their use of Pennsylvania German was continually diminishing. A revival of this heritage language has occurred over the past c. three decades, including language courses offered at community colleges, public libraries, etc., where ethnic Pennsylvania Germans wish to (re-)learn the language of their grandparents. Written Pennsylvania German data from four points in time between the 1860s and the 1990s were analysed in this study. Based on these linguistic analyses, differences between the data sets are shown that point towards a diachronic change in the language contact situation of Pennsylvania German speakers. Sociolinguistic and extralinguistic factors are considered that influence the role of PG and make their speakers heritage speakers much in the sense of recent immigrant heritage speakers, although delayed by 200 years.
Chiasmus
(2014)
Ciceronianismus
(2014)
We discovered several recurring errors in the current version of the Europarl Corpus originating both from the web site of the European Parliament and the corpus compilation based thereon. The most frequent error was incompletely extracted metadata leaving non-textual fragments within the textual parts of the corpus files. This is, on average, the case for every second speaker change. We not only cleaned the Europarl Corpus by correcting several kinds of errors, but also aligned the speakers’ contributions of all available languages and compiled every- thing into a new XML-structured corpus. This facilitates a more sophisticated selection of data, e.g. querying the corpus for speeches by speakers of a particular political group or in particular language combinations.
Communication of stereotypes in the classroom: biased language use of German and Turkish adolescents
(2014)
Little is known about the linguistic transmission and maintenance of mutual stereotypes in interethnic contexts. This field study, therefore, investigated the linguistic expectancy bias (LEB) and the linguistic intergroup bias (LIB) among German and Turkish adolescents (13 to 20 years) in the school context. The LEB refers to the general phenomenon of describing stereotypes more abstractly. The LIB is the tendency to use language abstraction for in-group protective reasons. Results revealed an unmoderated LEB, whereas the LIB only occurred when foreigners were in the numerical majority, the classroom composition was perceived as a learning disadvantage, or the interethnic conflict frequency was high. These findings provide first evidence for the use of both LEB and LIB in an interethnic classroom setting.
We compare several different corpus- based and lexicon-based methods for the scalar ordering of adjectives. Among them, we examine for the first time a low- resource approach based on distinctive- collexeme analysis that just requires a small predefined set of adverbial modifiers. While previous work on adjective intensity mostly assumes one single scale for all adjectives, we group adjectives into different scales which is more faithful to human perception. We also apply the methods to both polar and non-polar adjectives, showing that not all methods are equally suitable for both types of adjectives.
This study presents the results of a large-scale comparison of various measures of pitch range and pitch variation in two Slavic (Bulgarian and Polish) and two Germanic (German and British English) languages. The productions of twenty-two speakers per language (eleven male and eleven female) in two different tasks (read passages and number sets) are compared. Significant differences between the language groups are found: German and English speakers use lower pitch maxima, narrower pitch span, and generally less variable pitch than Bulgarian and Polish speakers. These findings support the hypothesis that inguistic communities tend to be characterized by particular pitch profiles.
In recent minimalist work, it has been argued that C-agreement provides conclusive support for the following theoretical hypotheses (cf. Carstens 2003; van Koppen 2005; Haegeman & van Koppen 2012): (i) C hosts a separate set of phi-features, a parametric choice possibly linked to the V2 property; (ii) feature checking/valuation is accomplished under (closest) c-command (i.e. by the operation Agree, cf. Chomsky 2000 and subsequent work). This paper reviews the significance of C-agreement for syntactic theory and argues that certain systematic asymmetries between regular verbal agreement and complementizer agreement suggest that the latter does not result from operations that are part of narrow syntax. The case is based on the observation that at least in some Germanic varieties (most notably Bavarian), the realization of inflectional features in the C-domain is sensitive to adjacency effects and deletion of the finite verb in right node raising and comparatives. The fact that C may not carry inflection when the finite verb has been elided is taken to suggest that complementizer agreement does not involve a dependency between C and the subject, but father between C and the finite verb (i.e. T). More precisely, it is argued that inflectional features present in the C-domain are added postsyntactically via a process of feature insertion (cf. e.g. Embick 1997; Embick & Noyer 2001; Harbour 2003) that creates a copy of T’s (valued) <J)-set. It will then be shown that this account can also capture phenomena like first conjunct agreement (FCA) and external possessor agreement, which are often presented as crucial evidence of the syntactic nature of complementizer agreement (cf. van Koppen 2005; Haegeman & van Koppen 2012).
Content analysis provides a useful and multifaceted, methodological framework for Twitter analysis. CAQDAS tools support the structuring of textual data by enabling categorising and coding. Depending on the research objective, it may be appropriate to choose a mixed-methods approach that combines quantitative and qualitative elements of analysis and plays out their respective advantages to the greatest possible extent while minimising their shortcomings. In this chapter, we will discuss CAQDAS speech act analysis of tweets as an example of software-assisted content analysis. We start with some elementary thoughts on the challenges of the collection and evaluation of Twitter data before we give a brief description of the potentials and limitations of using the software QDA Miner (as one typical example for possible analysis programmes). Our focus will lie on analytical features that can be particularly helpful in speech act analysis of tweets.
This contribution presents the newest version of our ’Wortverbindungsfelder’ (fields of multi-word expressions), an experimental lexicographic resource that focusses on aspects of MWEs that are rarely addressed in traditional descriptions: Contexts, patterns and interrelations. The MWE fields use data from a very large corpus of written German (over 6 billion word forms) and are created in a strictly corpus-based way. In addition to traditional lexicographic descriptions, they include quantitative corpus data which is structured in new ways in order to show the usage specifics. This way of looking at MWEs gives insight in the structure of language and is especially interesting for foreign language learners.
Das 50-jährige IDS
(2014)
Das es-Gesamtsystem im Neuhochdeutschen. Ein Beitrag zu Valenztheorie und Konstruktionsgrammatik.
(2014)
Das Buch beschäftigt sich mit den verschiedenen Verwendungsweisen des Pronomens es. Grundlage der Analysen bildet ein Korpus, das Nähetexte aus dem Zeitraum zwischen 1650 und 2000 beinhaltet. Im ersten Teil der Arbeit wird das phorische es behandelt. Es werden implizite und explizite Verweise durch es unterschieden. Großer Wert wird dabei auf die ausführliche semantische und morphosyntaktische Beschreibung der einzelnen Subtypen von es gelegt. Bei der Beschreibung des Korrelat-es wird vor allem auf den Begriff der Integration zurückgegriffen und vor diesem Hintergrund ein Stufenmodell korrelativer Satzverbindungen mit es erarbeitet. Der zweite Teil der Arbeit widmet sich der Frage nach dem grammatiktheoretischen Status des nicht-phorischen es. Es wird dafür plädiert, der Beschreibung und Erklärung der verschiedenen Untertypen des nicht-phorischen es valenztheoretische und konstruktionsgrammatische Erkenntnisse zugrunde zu legen.
Das grimmsche Wörterbuch
(2014)
Data Mining with Shallow vs. Linguistic Features to Study Diversification of Scientific Registers
(2014)
We present a methodology to analyze the linguistic evolution of scientific registers with data mining techniques, comparing the insights gained from shallow vs. linguistic features. The focus is on selected scientific disciplines at the boundaries to computer science (computational linguistics, bioinformatics, digital construction, microelectronics). The data basis is the English Scientific Text Corpus (SCITEX) which covers a time range of roughly thirty years (1970/80s to early 2000s) (Degaetano-Ortlieb et al., 2013; Teich and Fankhauser, 2010). In particular, we investigate the diversification of scientific registers over time. Our theoretical basis is Systemic Functional Linguistics (SFL) and its specific incarnation of register theory (Halliday and Hasan, 1985). In terms of methods, we combine corpus-based methods of feature extraction and data mining techniques.
Im Onlinewörterbuch elexiko (www.elexiko.de) sind eine Reihe von hochfrequenten Stichwörtern im Rahmen des sogenannten „Lexikons zum öffentlichen Sprachgebrauch“ ausführlich in ihrer Bedeutung und Verwendung korpusgestützt beschrieben. Dieser Wortschatzausschnitt deckt verschiedene Themen aus Politik und Gesellschaft ab und enthält Lexeme, die zentralen politischen und gesellschaftlichen Diskursen, wie sie im Korpus präsent sind, angehören. In elexiko werden diese Lexeme semantisch und pragmatisch angemessen, d.h. hinreichend differenziert und sprachreflektierend dargestellt. Dabei folgt die Darstellung der linguistischen Konzeption von elexiko, die im Band „Grundfragen der elektronischen Lexikografie. elexiko – das Online-Informationssystem zum deutschen Wortschatz“ (herausgegeben von Ulrike Haß, 2005) dargelegt ist.
This contribution presents the procedure used in the Handbuch deutscher Kommunikationsverben and in its online version Kommunikationsverben in the lexicographical internet portal OWID to divide sets of semantically similar communication verbs into ever smaller sets of ever closer synonyms. Kommunikationsverben describes the meaning of communication verbs on two levels: a lexical level, represented in the dictionary entries and by sets of lexical features, and a conceptual level, represented by different types of situations referred to by specific types of verbs. The procedure starts at the conceptual level of meaning where verbs used to refer to the same specific situation type are grouped together. At the lexical level of meaning, the sets of verbs obtained from the first step are successively divided into smaller sets on the basis of the criteria of (i) identity of lexical meaning, (ii) identity of lexical features, and (iii) identity of contexts of usage. The stepwise procedure applied is shown to result in the creation of a semantic network for communication verbs.
Dependenzstruktur
(2014)
Der Blick zurück nach vorn
(2014)
In diesem Beitrag wird an einigen Beispielen aus der nominalen Morphologie bzw. der Morphosyntax der deutschen Substantivgruppe gezeigt, wie sich in den Veränderungen in diesem Bereich, die sich über das 20. Jahrhundert hin beobachten lassen, Fragen eines langfristigen Systemwandels mit Regularitäten des Sprachgebrauchs überlagern. Im Mittelpunkt soll die Frage der Markierung der Kasus – insbesondere in den allgemein als „kritisch“ angesehenen Fällen von Genitiv und Dativ – stehen. Wenn man die Daten dazu betrachtet, sieht man, dass in den meisten Fällen schon zum Anfang des 20. Jahrhunderts eine weitgehende Anpassung an die Regularitäten der Monoflexion erfolgt war, auch, dass dieser Prozess über das Jahrhundert hin fortschreitet. Bemerkenswert ist, dass insgesamt die als „alt“ angesehenen Fälle in den untersuchten Korpora geschriebener Sprache (sehr) selten auftauchen, dass aber in zunehmendem Ausmaß die daraus folgende Markiertheit in der einen oder anderen Weise funktional genutzt wird. Einen Fall eigener Art stellt in diesem Zusammenhang der Genitiv dar, der sich bei den starken Maskulina und Neutra bekanntlich dem Trend zur „Einmalmarkierung“ der Kasus an den flektierten, das Substantiv begleitenden Elementen widersetzt. Das führt zu der bekannten Orientierung dieser Formen auf die Nicht-Objekt-Verwendungen und auch zu einem auffälligen Maß an Variation in der Nutzung der entsprechenden Flexionsformen.
Unter dem Titel „Ihr Beitrag bitte! – Der Nutzerbeitrag im Wörterbuchprozess“ fand vom 18. bis 21. September 2012 ein Symposium im Rahmen des GAL-Kongresses 2012 „Wissen – Wörter – Wörterbücher“ in Erlangen statt, das sich in Vorträgen und einer Abschlussdiskussion mit der Frage beschäftigte, welchen Beitrag Wörterbuchbenutzer und –benutzerinnen für Internetwörterbücher leisten, leisten können bzw. leisten sollten. Die vorliegende Onlinepublikation enthält nun vier Beiträge des Erlanger Symposiums (von Karin Rautmann, Katrin Thier, Luca Melchior und Robert Lew), die einen guten Überblick über das Thema geben, indem sie ein breites Spektrum an Möglichkeiten für Nutzer, sich am Auf- und Ausbau lexikografischer Angebote im Internet zu beteiligen, aufzeigen. In der Einleitung erfolgt außerdem eine Zusammenfassung der wichtigsten in den Beiträgen, Vorträgen und Diskussionen angesprochenen Punkte.
Designing a Bilingual Speech Corpus for French and German Language Learners: a Two-Step Process
(2014)
We present the design of a corpus of native and non-native speech for the language pair French-German, with a special emphasis on phonetic and prosodic aspects. To our knowledge there is no suitable corpus, in terms of size and coverage, currently available for the target language pair. To select the target L1-L2 interference phenomena we prepare a small preliminary corpus (corpus1), which is analyzed for coverage and cross-checked jointly by French and German experts. Based on this analysis, target phenomena on the phonetic and phonological level are selected on the basis of the expected degree of deviation from the native performance and the frequency of occurrence. 14 speakers performed both L2 (either French or German) and L1 material (either German or French). This allowed us to test, recordings duration, recordings material, the performance of our automatic aligner software. Then, we built corpus2 taking into account what we learned about corpus1. The aims are the same but we adapted speech material to avoid too long recording sessions. 100 speakers will be recorded. The corpus (corpus1 and corpus2) will be prepared as a searchable database, available for the scientific community after completion of the project.
Die Beschäftigung mit der deutschen Sprache und Kultur trifft in dem großen und vielfältigen Raum, den das subsaharische Afrika umfasst, auf sehr unterschiedliche Voraussetzungen. Diese Bandbreite mit ihren sprachenpolitischen Implikationen und ihren praktischen Folgen auszuleuchten hat sich der DAAD zusammen mit afrikanischen Germanistinnen und Germanisten zur Aufgabe gemacht. Untrennbar damit verbunden ist eine zweite Fragestellung: Unter welchen Aspekten und in welcher fachlichen Akzentuierung ist die Beschäftigung mit der deutschen Sprache und Kultur in Afrika sinnvoll und auch möglich? Die Antworten auf diese Grundfragen bewegen sich stets im Zwischenraum einer interkulturellen Kulturwissenschaft und einer anwendungsorientierten Praxis.