Refine
Year of publication
Document Type
- Article (196)
- Part of a Book (112)
- Conference Proceeding (110)
- Book (21)
- Part of Periodical (7)
- Other (6)
- Review (2)
- Image (1)
- Working Paper (1)
Is part of the Bibliography
- yes (456) (remove)
Keywords
- Deutsch (150)
- Korpus <Linguistik> (138)
- Interaktion (45)
- Konversationsanalyse (41)
- Gesprochene Sprache (39)
- Forschungsdaten (36)
- Computerlinguistik (31)
- Annotation (25)
- Englisch (24)
- Multimodalität (23)
Publicationstate
- Veröffentlichungsversion (456) (remove)
Reviewstate
- Peer-Review (456) (remove)
Publisher
- IDS-Verlag (34)
- de Gruyter (20)
- Zenodo (19)
- Verlag für Gesprächsforschung (16)
- Linköping University Electronic Press (14)
- European language resources association (ELRA) (13)
- Lexical Computing CZ s.r.o. (12)
- Peter Lang (11)
- Springer (11)
- Springer Nature (11)
This paper investigates the syntactic behaviour of adverbial clauses in contemporary German and Italian. It focuses on three main questions: (i) How many degrees of syntactic integration of adverbial clauses are there to be distinguished by an adequate grammatical description of the two languages? (ii) Which linear and hierarchical positions in the structure of the matrix sentence can be occupied by adverbial clauses? (iii) Which is the empirical distribution of adverbial clauses introduced by the conjunctions als, während, wenn, obwohl and weil in German, as well as quando, mentre, se, sebbene and perché in Italian?
Responding to question (i), a distinction is drawn between strongly integrated, weakly integrated and syntactically disintegrated adverbial clauses. There are further degrees on the gradient of syntactic integration, which are not examined in this paper. Responding to question (ii), eight classes of structural positions in the matrix sentence are identified that can be occupied by adverbial clauses. Five of them are positions of syntactic integration, three are positions of disintegration. Responding to question (iii), the distribution of the ten classes of adverbial clauses is described on the basis of a corpus of internet data. Strongly integrated, weakly integrated and disintegrated adverbial clauses show clearly different distributions within the structure of the matrix sentence. Also the semantic classes of adverbial clauses (temporal, adversative, conditional, concessive, causal) are distributed differently.
The methods utilized in the area of research into dictionary use are established research methods in the social sciences. After explicating the different steps of a typical empirical investigation, this article provides examples of how these different methods are used in various user studies conducted in the field of using online dictionaries. Thereby, different kinds of data collection (surveys as online questionnaires, log files and eye tracking) as well as different research design structures (for instance, ex-post-facto design or experimental design) are discussed.
Once a new word or a new meaning is added to a monolingual dictionary, the lexicographer is to provide a definition of this item. This paper focuses on the methodological challenges in writing such definitions. After a short discussion of the central terminology (method and definition), the article describes factors which inform this process: linguistic theories, linguistic and lexicographical methods, and types of definitions. Using the example of elexiko, a dictionary project of the Institute for the German language (IDS) in Mannheim, Germany, the paper finally showcases the compilation of definitions in a monolingual online dictionary of contemporary German.
Bezeichnungen für Personen, die sich nicht in ihrem Heimatland aufhalten (z.B. Migrant, Ausländer, Flüchtling) werden in der Sprachgemeinschaft häufig wertend und kontrovers verwendet. In dem Beitrag wird gezeigt, dass die allgemeinsprachige Lexikografie diesen Aspekt bislang nicht angemessen berücksichtigt – weder in der korpusgestützten, methodischen Erfassung und Analyse von Sprachdaten noch in der beschreibenden Darstellung. Am Beispiel von elexiko werden Ansätze vorgestellt, die das Potenzial besitzen, dieses Desiderat einzulösen.
Der Beitrag behandelt die Frage, inwiefern es sich bei den gegenwärtigen Russlanddeutschen (Erwachsenen und Jugendlichen der ersten Generation, Einwanderungswelle der 1990er Jahre aus Sprachinseln) um Re-Migranten handelt, welche Veränderungen in den Varietätenrepertoires stattfinden und welche Schwierigkeiten und Probleme, aber auch Vorteile sich durch diese spezifische Migrationskonfiguration für die zugewanderten Russlanddeutschen ergeben. Die besondere Situation der Re-Migration mit der spezifischen linguistisch-soziolinguistischen Problematik wird durch Beispiele aus dem aktuellen IDS-Projekt „Migrationslinguistik“ veranschaulicht. Einerseits liegen besondere varietätenlinguistische Konstellationen vor, die bei der russlanddeutschen Migrantenpopulation generationenspezifische Konturen aufweisen. Dadurch entstehen andererseits unikale linguistische Sprachkontaktbedingungen, die die sprachlich-kommunikative Integration und den Erhalt der Migrantensprache Russisch in besonderer Weise beeinflussen können.
Eine Umschau in jüngeren sprachwissenschaftlichen Arbeiten zeigt einen häufig betonten engen Zusammenhang von Sprache und Identität, vor allem den der eigenen Sprache und der ethnischen Identität. Dass aber Sprache in einem zwei- oder mehrsprachigen Kontext nur eine Ressource einer Identitätskonstruktion sein kann, wird selten herausgestellt. Der nachstehende Aufsatz untersucht als charakteristisches Beispiel einer gelösten Bindung von Sprache und ethnischer Identität die Minderheit der deutschen Aussiedler aus der ehemaligen Sowjetunion. Im Vordergrund steht dabei die zweite Generation, bei der ihr Zugehörigkeitsgefühl zur ethnischen Identität als Deutsche trotz der erfolgten Sprachumstellung sich nicht oder selten verändert hat.
In 2010, ISO published a standard for syntactic annotation, ISO 24615:2010 (SynAF). Back then, the document specified a comprehensive reference model for the representation of syntactic annotations, but no accompanying XML serialisation. ISO’s subcommittee on language resource management (ISO TC 37/SC 4) is working on making the SynAF serialisation ISOTiger an additional part of the standard. This contribution addresses the current state of development of ISOTiger, along with a number of open issues on which we are seeking community feedback in order to ensure that ISOTiger becomes a useful extension to the SynAF reference model.
Recipient design is a key constituent of intersubjectivity in interaction. Recipient design of turns is informed by prior knowledge about and shared experience with recipients. Designing turns in order to be maximally effective for the particular recipient(s) is crucial for accomplishing intersubjectively coordinated action. This paper reports on a specific pragmatic structure of recipient design, i.e. counter-factual recipient design, and how it impinges on intersubjectivity in interaction. Based on an analysis of video-recordings data from driving school lessons in German, two kinds of counterfactual recipient design of instructors' requests are distinguished: pedagogic and egocentric turn-design. Counterfactual, pedagogic turn-design is used strategically to diagnose student skills and to create opportunities for corrective instructions. Egocentric turn-design rests on private, non-shared knowledge of the instructor. Egocentrically designed turns imply expectations of how to comply with requests which cannot be recovered by the student and which lead to a breakdown of intersubjective cooperation. This paper identifies practices, sources and interactional consequences of these two kinds of counterfactual recipient design. In addition, the study enhances our understanding of recipient design in at least three ways. It shows that recipient design does not only concern referential and descriptive practices, but also the indexing intelligible projections of next actions; it highlights the productive, other-positioning effects of recipient design; it argues that recipient design should be analyzed in terms of temporally extended interactional trajectories, linking turn-constructional practices to interactional histories and consecutive trajectories of joint action.
This article reports about the on-going work on a new version of the metadata framework Component Metadata Infrastructure (CMDI), central to the CLARIN infrastructure. Version 1.2 introduces a number of important changes based on the experience gathered in the last five years of intensive use of CMDI by the digital humanities community, addressing problems encountered, but also introducing new functionality. Next to the consolidation of the structure of the model and schema sanity, new means for lifecycle management have been introduced aimed at combatting the observed proliferation of components, new mechanism for use of external vocabularies will contribute to more consistent use of controlled values and cues for tools will allow improved presentation of the metadata records to the human users. The feature set has been frozen and approved, and the infrastructure is now entering a transition phase, in which all the tools and data need to be migrated to the new version.
Der vorliegende Beitrag erkundet den Zusammenhang zwischen der Komplexität politischer Argumentationsprozesse und der Diversifikation der Semantik von Schlüsselwörtern, deren Bedeutung im Argumentationsprozess umkämpft und in zahlreichen Facetten entfaltet widAdegenstand der Untersuchung ist die Verwendung von „Ökologie" in den Schlichtungsgesprächen zum Bahnprojekt Stuttgart 21. Im Unterscheid zu bisher vorliegenden Analysen zu semantischen Kämpfen geht es weniger darum, wie ein Ausdruck von einer Partei im Gegensatz zu anderen semantisiert wird. Es wird vielmehr gezeigt, wie semantische Diversifizierung und Ambiguität von „Ökologie" im expertischen Argumentationsprozess entstehen und welche kommunikativen Effekte dies für die Möglichkeit der Bürgerbeteiligung mit sich bringt. Es werden drei Praktiken identifiziert, mit denen die Interaktionsteilnehmer selbst auf semantische Diversifizierung und Ambiguität reagieren und versuchen, den Ausdruck eindeutig interpretierbar und die Quaestio entscheidbar zu machen: Strategieunterstellungen, Popularisierungen und Populismus. Die Interaktionsanalysen zeigen dabei, dass diese Praktiken selbst die Problematik, die sie lösen sollen, reproduzieren.
In dem Beitrag wird der Frage nachgegangen, inwiefern die Frequenz eines Wortes mit seiner orthographischen Richtigschreibung zusammenhangt. Werden häufige Wörter öfter und früher richtig geschrieben? Und welche Rolle spielt dabei die orthographische Regelhaftigkeit der Wortstrukturen? Unter Zuhilfenahme maschineller Analyseverfahren aus der Großstudie "Automatisierte Rechtschreibdiagnostik" (Fay/Berkling/Stüker 2012) werden diesbezuglich über 1000 Schülertexte von Klasse 2 bis 8 untersucht. Im Ergebnis werden zum einen einige Annahmen, die bislang vor allem auf Erfahrungswerten aus der sprachdidaktischen Arbeit fußten, empirisch bestätigt, zum anderen werden sie hinsichtlich spezifischer Rechtschreibphänomene differenziert und erweitert.
Contents:
1. Michal Křen: Recent Developments in the Czech National Corpus, S. 1
2. Dan Tufiş, Verginica Barbu Mititelu, Elena Irimia, Stefan Dumitrescu, Tiberiu Boros, Horia Nicolai Teodorescu: CoRoLa Starts Blooming – An update on the Reference Corpus of Contemporary Romanian Language, S. 5
3. Sebastian Buschjäger, Lukas Pfahler, Katharina Morik: Discovering Subtle Word Relations in Large German Corpora, S. 11
4. Johannes Graën, Simon Clematide: Challenges in the Alignment, Management and Exploitation of Large and Richly Annotated Multi-Parallel Corpora, S. 15
5. Stefan Evert, Andrew Hardie: Ziggurat: A new data model and indexing format for large annotated text corpora, S. 21
6. Roland Schäfer: Processing and querying large web corpora with the COW14 architecture, S. 28
7. Jochen Tiepmar: Release of the MySQL-based implementation of the CTS protocol, S. 35
Die öffentliche Akzeptanz und Wirkung natur- und technikwissenschaftlicher Forschung hängt grundlegend davon ab, ob sich die Ziele und Forschungsergebnisse an die Öffentlichkeit vermitteln lassen. Doch die Inhalte aktueller Forschungsvorhaben sind für ein Laienpublikum oft nur schwer zugänglich und verständlich. Vor dem Hintergrund, die gesellschaftliche Diskussion natur- und technikwissenschaftlicher Forschung zu verbessern, untersuchen und bewerten wir im Projekt PopSci – Understanding Science einen wichtigen Sektor des populärwissenschaftlichen Diskurses in Deutschland empirisch. Hierfür identifizieren wir die linguistischen Merkmale deutscher populärwissenschaftlicher Texte durch korpusbasierte Methoden und untersuchen deren Effekt auf die kognitive Verarbeitung der Texte durch Laien. Dazu setzen wir Vor- und Nachwissenstests ein. Außerdem messen wir die Blickbewegungen der Leserinnen und Leser, während sie populärwissenschaftliche Texte lesen. Aus dieser Kombination von unterschiedlichen Methoden versuchen wir, erste Empfehlungen zur Verbesserung des linguistischen Stils und der Wissensrepräsentation populärwissenschaftlicher Texte abzuleiten.
We present studies using the 2013 log files from the German version of Wiktionary. We investigate several lexicographically relevant variables and their effect on look-up frequency: Corpus frequency of the headword seems to have a strong effect on the number of visits to a Wiktionary entry. We then consider the question of whether polysemic words are looked up more often than monosemic ones. Here, we also have to take into account that polysemic words are more frequent in most languages. Finally, we present a technique to investigate the time-course of look-up behaviour for specific entries. We exemplify the method by investigating influences of (temporary) social relevance of specific headwords.
Der vorliegende Aufsatz untersucht die Syntax und Semantik sogenannter Postponierer, d.h. konjunktionaler Konnektoren, die den von ihnen eingeleiteten Nebensatz dem Hauptsatz stets nachstellen. Anhand von sodass und zumal werden die Kerneigenschaften solcher Konnektoren im Deutschen vorgestellt. Am Beispiel der italienischen Konjunktionen cosicché, tanto più che und perché wird diskutiert, ob der Begriff des Postponierers für den Sprachvergleich genutzt werden kann. In einem nächsten Schritt werden die Postponierer des Deutschen unter Beiziehung sprachgeschichtlicher Argumente präziser beschrieben und im Übergangsfeld zwischen Adverbkonnektoren und Subjunktoren verortet. Es zeigt sich, dass die untersuchten Konnektoren sich letztlich sehr unterschiedlich verhalten, sodass es fraglich erscheint, ob ihre Zusammenfassung zu einer gemeinsamen Klasse gerechtfertigt ist.
Der Sammelband zur typologisch und kontrastiv vergleichenden grammatischen Erforschung und Beschreibung des Satzanfangs des Deutschen und vier seiner Kontrastsprachen ist ein Ergebnis eines Forschungsnetzwerks, bestehend aus dem Institut für Deutsche Sprache (Mannheim) und Forschergruppen verschiedener europäischer Universitäten. Unter Berücksichtigung insbesondere morphosyntaktischer und informationsstruktureller Aspekte werden die satztopologischen Unterschiede der typologisch recht heterogenen Sprachen bzw. Sprachfamilien unter verschiedenen Gesichtspunkten beleuchtet. Die Untersuchungen werden korpusbasiert durchgeführt, wobei sich die Hälfte der Beiträge auf aufbereitete POS-getaggte Wikipedia-Korpora stützt. Die quantitativ ausgerichteten Korpusanalysen ermöglichen einen genauen Einblick in die unterschiedlichen Strukturmerkmale der betreffenden Sprachen sowie in sprachübergreifende Textmerkmale, und die qualitativen Untersuchungen zeigen Ähnlichkeiten und Abweichungen bei bestimmten Verfahren, die sich morphosyntaktisch iederschlagen und besonders am Satzanfang relevant sind. Insgesamt erlauben die Beiträge Hypothesen zu topologisch und informationsstrukturell markierten Satzanfängen und zu Präferenzen in den jeweiligen Sprachen, aber auch zu möglichen Konstanten und Gemeinsamkeiten, was – auf differenziertere Korpora erweitert – für die Bereiche Sprache und Kognition sowie computergestützte Übersetzung ein großer Gewinn sein dürfte.
In order to demonstrate why it is important to correctly account for the (serial dependent) structure of temporal data, we document an apparently spectacular relationship between population size and lexical diversity: for five out of seven investigated languages, there is a strong relationship between population size and lexical diversity of the primary language in this country. We show that this relationship is the result of a misspecified model that does not consider the temporal aspect of the data by presenting a similar but nonsensical relationship between the global annual mean sea level and lexical diversity. Given the fact that in the recent past, several studies were published that present surprising links between different economic, cultural, political and (socio-)demographical variables on the one hand and cultural or linguistic characteristics on the other hand, but seem to suffer from exactly this problem, we explain the cause of the misspecification and show that it has profound consequences. We demonstrate how simple transformation of the time series can often solve problems of this type and argue that the evaluation of the plausibility of a relationship is important in this context. We hope that our paper will help both researchers and reviewers to understand why it is important to use special models for the analysis of data with a natural temporal ordering.
This paper explores speakers’ notions of the situational appropriacy of linguistic variants. We conducted a web-based survey in which we collected ratings of the appropriacy of variants of linguistic variables in spoken German. A range of quantitative methods (cluster analysis, factor analysis and various forms of visualization techniques) is applied in order to analyze metalinguistic awareness and the differences in the evaluation of written vs. spoken stimuli. First, our data show that speakers’ ratings of the appropriacy of linguistic variants vary reliably with two rough clusters representing formal and informal speech situations and genres. The findings confirm that speakers adhere to a notion of spoken standard German which takes genre and register-related variation into account. Secondly, our analysis reveals a written language bias: metalinguistic awareness is strongly influenced by the physical mode of the presentation of linguistic items (spoken vs. written).
Harold Garfinkel, Begründer der Ethnomethodologie, wäre dieses Jahr 100 Jahre alt geworden, seine Studies in Ethnomethodology werden 50 Jahre. Grund genug diesen doppelten Geburtstag mit einer Tagung zur "deutschsprachigen Vorge-schichte, Wirkung und Rezeption des Werkes und der Person zu würdigen" (so der Ankündigungstext zur Tagung), die nicht ganz zufällig in Konstanz stattfand, lange Zeit und nach wie vor eine Hochburg rekonstruktiver Sozialforschung (auch) ethnomethodologischer Prägung. Die Tagung Harold Garfinkel's 'Studies in Ethnomethodolgy' – Fifty Years After vom 26.-28.10.2017 an der Universität Konstanz, ausgerichtet vom Lehrstuhl für Allgemeine Soziologie und Kultursoziologie und organisiert von Jörg Bergmann, Christian Meyer und Erhard Schüttpelz, tat dies in einer gebührlichen und beson-deren Weise: Die acht Kapitel der Studies in Ethnomethodology (im Folgenden kurz Studies), ein Konvolut aus Essays und Artikeln, die 1967 erschienen sind, dienten als Grundlage zur Strukturierung der Tagung und als Ausgangspunkt der einzelnen Vorträge.
This article explores how close one can come to a cultural-scientific perspective on the basis of a constitution-analytical methodology. We do this on the basis of a comparison of the celebration of Totensonntag in Zotzenbach (Southern Hesse) and Sarepta (Wolgograd). In both places, there are protestant churches that perform this ritual to commemorate the dead on this “Sunday of the Dead” as a part of their church service. Our scientific interest lies in the reconstruction of the rituality produced during the in situ execution. In both services, the names of the deceased are read out and a candle is lit for each deceased person. In Zotzenbach the priest reads out the names and an assistant ignites the candles for the deceased, whereas in Sarepta the bereaved are responsible for this. Since the ritual is organised in very different ways in terms of architecture-for-interaction (statically in Zotzenbach, spatially dynamic in Sarepta), we can reconstruct two completely different models of rituality: a demonstrative one (Zotzenbach) and a participative one (Sarepta). The demonstrative model works on the basis of a finely tuned coordination between the two church representatives and is aimed at a dignified execution. The model in Sarepta is not suitable for the production of formality due to its participatory structure. Here, however, the focus is also on the aspect of socialization, which goes beyond the church service and offers the Russian-German worshipers the opportunity to situationally constitute as a culturally homogeneous group.
Der vorliegende Beitrag beschreibt auf der Basis authentischer Alltagsinteraktionen das Formen- und Funktionsspektrum der äußerungsmodalisierenden Kommen-tarphrase ohne Scheiß im gesprochenen Deutsch. Die Konstruktion wird von Inter-agierenden insbesondere als Ressource zur Steigerung des Geltungsanspruchs einer Bezugsäußerung genutzt, wodurch diese als wahr und/oder ernstgemeint modali-siert wird. Damit leistet ohne Scheiß einen wichtigen Beitrag zur Bearbeitung des Erwartungsmanagements durch den/die SprecherIn sowie zur Herstellung von In-tersubjektivität. Die Konstruktion ist syntaktisch variabel und kann somit Äußerun-gen sowohl prospektiv als auch retraktiv modalisieren. Zudem wird mit der Wahl des Lexem Scheiß ein nähesprachliches Register aktiviert, was in Verbindung mit weiteren (prosodischen und/oder lexikalischen) Elementen zu affektiver Aufladung führen kann. Eine abschließende Darstellung häufiger lexikalischer Kookkurrenz-partner und deren funktionaler Bedeutung sowie ein Abgleich zu intrakonstruktio-nalen Varianten wie ohne Witz/ohne Spaß zeigt die Produktivität der Konstruktion im alltäglichen Sprachgebrauch auf.
We present a major step towards the creation of the first high-coverage lexicon of polarity shifters. In this work, we bootstrap a lexicon of verbs by exploiting various linguistic features. Polarity shifters, such as ‘abandon’, are similar to negations (e.g. ‘not’) in that they move the polarity of a phrase towards its inverse, as in ‘abandon all hope’. While there exist lists of negation words, creating comprehensive lists of polarity shifters is far more challenging due to their sheer number. On a sample of manually annotated verbs we examine a variety of linguistic features for this task. Then we build a supervised classifier to increase coverage. We show that this approach drastically reduces the annotation effort while ensuring a high-precision lexicon. We also show that our acquired knowledge of verbal polarity shifters improves phrase-level sentiment analysis.
In the NLP literature, adapting a parser to new text with properties different from the training data is commonly referred to as domain adaptation. In practice, however, the differences between texts from different sources often reflect a mixture of domain and genre properties, and it is by no means clear what impact each of those has on statistical parsing. In this paper, we investigate how differences between articles in a newspaper corpus relate to the concepts of genre and domain and how they influence parsing performance of a transition-based dependency parser. We do this by applying various similarity measures for data point selection and testing their adequacy for creating genre-aware parsing models.
This paper deals with the creation of the first morphological treebank for German by merging two pre-existing linguistic databases. The first of these is the linguistic database CELEX which is a standard resource for German morphology. We build on its refurbished and modernized version. The second resource is GermaNet, a lexical-semantic network which also provides partial markup for compounds. We describe the state of the art and the essential characteristics of both databases and our latest revisions. As the merging involves two data sources with distinct annotation schemes, the derivation of the morphological trees for the unified resource is not trivial. We discuss how we overcome problems with the data and format, in particular how we deal with overlaps and complementary scopes. The resulting database comprises about 100,000 trees whose format can be chosen according to the requirements of the application at hand. In our discussion, we show some future directions for morphological treebanks. The Perl script for the generation of the data from the sources will be made publicly available on our website.
This paper discusses how cognitive aspects can be incorporated into lexicographic meaning descriptions based on corpus-driven analysis. The new German Online dictionary “Paronyme − Dynamisch im Kontrast” is concerned with easily confused words such as effektiv/effizient, sensibel/sensitiv. It is currently in the process of being developed and it aims at adopting a more conceptual and encyclopedic approach to meaning. Contrastive entries emphasize usage, comparing conceptual categories and indicating the mapping of knowledge. Adaptable access to lexicographic details offers different perspectives on information, and authentic examples reflect prototypical structures.
Some of the cognitive features are demonstrated with the help of examples. Firstly, I will outline how patterns of usage imply conceptual categories as central ideas instead of sufficiently logical criteria of semantic distinction. In this way, linguistic findings correlate better with how users conceptualize language. Secondly, it is pointed out how collocates are family members and fillers in contexts. Thirdly, I will demonstrate how contextual structure and function are included by summarizing referential information. Details are drawn from corpus data; they are usage-based patterns illustrating conversational interaction and semantic negotiation in contemporary public discourse. Finally, I will show flexible consultation routines where the focus on structural knowledge changes.
This paper gives an insight into the basic concepts for a corpus-based lexical resource of spoken German, which is being developed by the project "The Lexicon of Spoken German"(Lexik des gesprochenen Deutsch, LeGeDe) at the "Institute for the German Language" (Institut für Deutsche Sprache, IDS) in Mannheim. The focus of the paper is on initial ideas of semi-automatic and automatic resources that assist the quantitative analysis of the corpus data for the creation of dictionary content. The work is based on the "Research and Teaching Corpus of Spoken German" (Forschungs- und Lehrkorpus Gesprochenes Deutsch, FOLK).
This paper discusses changes of lexicographic traditions with respect to approaches to meaning descriptions towards more cognitive perspectives. I will uncover how cognitive aspects can be incorporated into meaning descriptions based on corpus-driven analysis. The new German Online dictionary “Paronyme − Dynamisch im Kontrast” (Storjohann 2014; 2016) is concerned with easily confused words such as effektiv/effizient, sensibel/sensitiv. It is currently in the process of being developed and it aims at adopting a more conceptual and encyclopaedic approach to meaning by incorporating cognitive features. As a corpus-guided reference work it strives to adequately reflect ideas such as conceptual structure, categorisation and knowledge. Contrastive entries emphasise aspects of usage, comparing conceptual categories and indicate the (metonymic) mapping of knowledge. Adaptable access to lexicographic details and variable search options offer different foci and perspectives on linguistic information, and authentic examples reflect prototypical structures. Some of the cognitive features are demonstrated with the help of examples. Firstly, I will outline how patterns of usage imply conceptual categories as central ideas instead of sufficiently logical criteria of semantic distinction. In this way, linguistic findings correlate better with how users conceptualise language. Secondly, it is pointed out how collocates are treated as family members and fillers in contexts. Thirdly, I will demonstrate how contextual structure and functions are included summarising referential information. Details are drawn from corpus data, they are usage-based linguistic patterns illustrating conversational interaction and semantic negotiations in contemporary public discourse. Finally, I will outline consultation routines which activate different facets of structural knowledge, e.g. through changes of the ordering of information or through the visualisation of semantic networks.
Languages employ different strategies to transmit structural and grammatical information. While, for example, grammatical dependency relationships in sentences are mainly conveyed by the ordering of the words for languages like Mandarin Chinese, or Vietnamese, the word ordering is much less restricted for languages such as Inupiatun or Quechua, as these languages (also) use the internal structure of words (e.g. inflectional morphology) to mark grammatical relationships in a sentence. Based on a quantitative analysis of more than 1,500 unique translations of different books of the Bible in almost 1,200 different languages that are spoken as a native language by approximately 6 billion people (more than 80% of the world population), we present large-scale evidence for a statistical trade-off between the amount of information conveyed by the ordering of words and the amount of information conveyed by internal word structure: languages that rely more strongly on word order information tend to rely less on word structure information and vice versa. Or put differently, if less information is carried within the word, more information has to be spread among words in order to communicate successfully. In addition, we find that–despite differences in the way information is expressed–there is also evidence for a trade-off between different books of the biblical canon that recurs with little variation across languages: the more informative the word order of the book, the less informative its word structure and vice versa. We argue that this might suggest that, on the one hand, languages encode information in very different (but efficient) ways. On the other hand, content-related and stylistic features are statistically encoded in very similar ways.
In this paper, we will present a first attempt to classify commonly confused words in German by consulting their communicative functions in corpora. Although the use of so-called paronyms causes frequent uncertainties due to similarities in spelling, sound and semantics, up until now the phenomenon has attracted little attention either from the perspective of corpus linguistics or from cognitive linguistics. Existing investigations rely on structuralist models, which do not account for empirical evidence. Still, they have developed an elaborate model based on formal criteria, primarily on word formation (cf. Lăzărescu 1999). Looking from a corpus perspective, such classifications are incompatible with language in use and cognitive elements of misuse.
This article sketches first lexicological insights into a classification model as derived from semantic analyses of written communication. Firstly, a brief description of the project will be provided. Secondly, corpus-assisted paronym detection will be focused. Thirdly, in the main section the paper concerns the description of the datasets for paronym classification and the classification procedures. As a work in progress, new insights will continually be extended once spoken and CMC data are added to the investigations.
Unknown words are a challenge for any NLP task, including sentiment analysis. Here, we evaluate the extent to which sentiment polarity of complex words can be predicted based on their morphological make-up. We do this on German as it has very productive processes of derivation and compounding and many German hapax words, which are likely to bear sentiment, are morphologically complex. We present results of supervised classification experiments on new datasets with morphological parses and polarity annotations.
In diesem Aufsatz wird einzelfallanalytisch der Frage nachgegangen, wie die Struktur einer Kirchenbesichtigung aussieht. Im theoretischen Rahmen, der die Kirchenbesichtigung als kulturelle Praktik konzeptualisiert, wird „Objektkonstitution“ als eine aktive Leistung des Kirchenbesichtigers in den Blick genommen. Bei den Aufnahmen zum Kirchenbesichtigungskorpus wurden die Besichtiger nicht nur bei ihrem Gang durch den Kirchenraum und der visuellen Wahrnehmung bestimmter Raumaspekte gefilmt. Sie wurden vielmehr darum gebeten, ihre visuelle Wahrnehmung durch begleitendes Sprechen auch zu kommentieren. Aufgezeichnet wurde das Besichtigungskorpus mit zwei Kameras: einer Actionkamera, die den Wahrnehmungsraum der Besichtiger dokumentiert, und einer Kontextkamera, die ihnen bei ihrem Weg durch den Raum folgt.
Dieses experimentelle Erhebungsdesign, bei dem exothetisches Sprechen bewusst als wissenschaftliche Erhebungsmethode eingesetzt wird, macht es möglich, das Besichtigungskonzept der Personen als dynamisches Zusammenspiel ihrer visuellen Wahrnehmung des Kirchenraums und ihrer wahrnehmungsbegleitenden Exothese zu rekonstruieren. Dass Objektkonstitution eine aktive Herstellung ist, durch die der Kirchenraum in den Relevanzen seines Betrachters teilweise neu entsteht, zeigt die Fallanalyse in exemplarischer Klarheit: Anton, der analysierte Besichtiger, der sich ausführlich mit zwei großen Gemälden beschäftigt, konstituiert diese de facto als „Bilderrahmen“, ohne überhaupt auf die dargestellten Szenen einzugehen.
Multinomial processing tree (MPT) models are a class of measurement models that account for categorical data by assuming a finite number of underlying cognitive processes. Traditionally, data are aggregated across participants and analyzed under the assumption of independently and identically distributed observations. Hierarchical Bayesian extensions of MPT models explicitly account for participant heterogeneity by assuming that the individual parameters follow a continuous hierarchical distribution.We provide an accessible introduction to hierarchical MPT modeling and present the user-friendly and comprehensive R package TreeBUGS, which implements the two most important hierarchical MPT approaches for participant heterogeneity—the beta-MPT approach (Smith & Batchelder, Journal of Mathematical Psychology 54:167-183, 2010) and the latent-trait MPT approach (Klauer, Psychometrika 75:70-98, 2010). TreeBUGS reads standard MPT model files and obtains Markov-chain Monte Carlo samples that approximate the posterior distribution. The functionality and output are tailored to the specific needs of MPT modelers and provide tests for the homogeneity of items and participants, individual and group parameter estimates, fit statistics, and within- and between-subjects comparisons, as well as goodness-of-fit and summary plots. We also propose and implement novel statistical extensions to include continuous and discrete predictors (as either fixed or random effects) in the latent-trait MPT model.
Modeling the properties of German phrasal compounds within a usage-based constructional approach
(2017)
This paper discusses phrasal compounds in German (e.g.“Man-muss-doch-überalles-reden-können”-Credo, ‘one-should-be-able-to-talk-about-everything motto’). It provides the first empirically based investigation and description of this wordformation type within the theoretical framework of construction grammar. While phrasal compounds pose a problem for “traditional” generative approaches, I argue that a usage-based constructional model (e.g. Langacker 1987; Goldberg 2006) which takes into consideration aspects of frequency provides a suitable approach to modeling and explaining their properties. For this purpose, a large inventory of phrasal compounds was extracted from the German Reference Corpus (DeReKo) and modeled as pairings of form and meaning at different levels of specificity and abstractness within a bottom-up process.
Overall, this paper not only presents a new and original approach to phrasal compounds, but also offers interesting perspectives for dealing with composition in general.
Das Archiv für Gesprochenes Deutsch (AGD, Stift/Schmidt 2014) am Institut für Deutsche Sprache ist die zentrale Sammelstelle für Korpora des Gesprochenen Deutsch. Gegründet als Deutsches Spracharchiv (DSAv) im Jahre 1932 hat es über Eigenprojekte, Kooperationen und Übernahmen von Daten aus abgeschlossenen Forschungsprojekten einen Bestand von etwa 50 Variations- und Gesprächskorpora aufgebaut. Heute ist dieser Bestand fast vollständig digitalisiert und wird zu einem großen Teil der wissenschaftlichen Gemeinschaft über die Datenbank für Gesprochenes Deutsch (DGD) im Internet zur Nutzung in Forschung und Lehre angeboten.
Körperliche wie seelische Gesundheit ist ein hohes individuelles und gesellschaftliches Gut und Grundrecht. Häufig wird die Gesundheit durch ihr Gegenteil, d. h. in der Verständigung über Krankheit, thematisiert. Der gesellschaftliche Austausch über Krankheiten, Gesundheitsrisiken und Behandlungsmethoden ist untrennbar mit Sprache verknüpft (Busch/Spranz-Fogasy 2015); die Sprache ist „[…] das zentrale Medium, um medizinisches Wissen herzustellen, zu systematisieren, zu tradieren und auszutauschen.“ (Busch/Spranz-Fogasy 2015: 336). Ausgehend von dieser Prämisse wurde das Netzwerk „Linguistik und Medizin“ gegründet, um die Forschungstätigkeiten der verschiedenen linguistischen Disziplinen, die an den Verbindungslinien von „Sprache – Wissen – Medizin“ arbeiten, zu bündeln: Forschungsdesiderate sollen kooperierend bearbeitet und die interdisziplinäre Anschlussfähigkeit zwischen linguistischen und medizinischen, psychiatrischen sowie salutogenetischen Forschungsbereichen auf- und ausgebaut werden.
In the lexicon of pidgin and creole languages we can see an important part of these languages’ history of origin and of language contact. The current paper deals with the lexical sources of Tok Pisin and, more specifically, with words of German origin found in this language. During the period of German colonial domination of New Guinea and a number of insular territories in the Pacific (ca. 1885–1915), German words entered the emerging Tok Pisin lexicon. Based on a broad range of lexical and lexicographic data from the early 20th century up until today, we investigate the actual or presumed German origin of a number of Tok Pisin words and trace different lexical processes of integration that are linked to various, often though not always colonially determined, contact settings and sociocultural interactions.
Rekontextualisierung von Hate Speech als Aneignungs- und Positionierungsverfahren in Sozialen Medien
(2017)
Hate Speech wird im vorliegenden Aufsatz nicht als Medium der Herabwürdigung betrachtet, sondern als Positionierungsverfahren. Es handelt sich bei Hate Speech Liebert (2015, 176) zufolge um eine „unorganisierte [...] Praktik" innerhalb der Online-Kommunikation. Das würde erstens bedeuten, dass keine strategische Dekonstruktion einer spezifischen Identität damit verbunden ist, wie das etwa beim Cybermobbing der Fall wäre. Es gibt also keine Verabredungen und gruppenkonstitutiven Prozesse außerhalb der medial vermittelten Kommunikation. Es scheint jedoch auch die diskursdynamischen Prozesse auszublenden, die sich ad hoc „organisieren", wo Hassrede praktiziert wird. Zweitens ruft der Terminus der „Unorganisiertheit" die Assoziation einer strukturellen Unterspezifikation auf und damit das Bedürfnis nach einer präzisierenden Definition für diese Praktik. Drittens ware davon auszugehen, dass Hass-Kommentare verstreut an Diskursorten und zu willkürlichen Diskurszeiten auftreten, die deshalb nicht vorhersagbar sind.
In the first volume of Corpus Linguistics and Linguistic Theory, Gries (2005. Null-hypothesis significance testing of word frequencies: A follow-up on Kilgarriff. Corpus Linguistics and Linguistic Theory 1(2). doi:10.1515/ cllt.2005.1.2.277. http://www.degruyter.com/view/j/cllt.2005.1.issue-2/cllt.2005. 1.2.277/cllt.2005.1.2.277.xml: 285) asked whether corpus linguists should abandon null-hypothesis significance testing. In this paper, I want to revive this discussion by defending the argument that the assumptions that allow inferences about a given population – in this case about the studied languages – based on results observed in a sample – in this case a collection of naturally occurring language data – are not fulfilled. As a consequence, corpus linguists should indeed abandon null-hypothesis significance testing.
Historical sociolinguistics in colonial New Guinea: The Rhenish mission society in the Astrolabe Bay
(2017)
The Rhenish Mission Society, a German Protestant mission, was active in a small part of northern New Guinea, the Astrolabe Bay, between 1887 and 1932. Up until 1914, this region was under German colonial rule. The German dominance was also reflected in rules on language use in official contexts such as schools and administration.
Missionaries were strongly affected by such rules as their most important tool in mission work was language. In addition, they were also responsible for school education as most schools in the German colonial areas in the Pacific were mission-run. Thus, mission societies had to make decisions about what languages to use, considering their own needs, their ideological convictions, and the colonial government’s requirements. These considerations were framed by the complex setting of New Guinea’s language wealth where several hundred languages were, and still are, spoken.
This paper investigates a small set of original documents from the Rhenish Mission Society to trace what steps were taken and what considerations played a major role in the process of agreeing on a suitable means of communication with the people the missionaries wanted to reach, thereby touching upon topics such as language attitudes, language policies and politics, practical considerations of language learning and language spread, and colonial actions impacting local language ecologies.
Making 1:n explorable: a search interface for the ZAS database of clause-embedding predicates
(2017)
We introduce a recently published corpus-based database of German clause-embedding predicates and present an innovative web application for exploring it. The application displays the predicates and the corpus examples for these predicates in two separate tables that can be browsed and searched in real time. While familiar web interface paradigms make it easy for users to get started, the data presentation and the interactive advanced search components for the two tables are designed to accommodate remarkably complex query needs without the need for resorting to a dedicated query language or a more specialized tool. The 1:n relationship between predicates and their examples is exploited in the two tables in that, e.g. the predicate table also shows, for each predicate and each example attribute, all values that occur in the examples for this predicate. An easy-to-use visual query builder for arbitrary Boolean combinations of search criteria can optionally be displayed to pre-filter the underlying data presented in both tables. Several options for altering quantifier scope can be activated with simple checkboxes and considerably widen the space of searchable constellations.
CoMParS is a resource under construction in the context of the long-term project German Grammar in European Comparison (GDE) at the IDS Mannheim. The principal goal of GDE is to create a novel contrastive grammar of German against the background of other European languages. Alongside German, which is the central focus, the core languages for comparison are English, French, Hungarian and Polish, representing different typological classes. Unlike traditional contrastive grammars available for German, which usually cover language pairs and are based on formal grammatical categories, the new GDE grammar is developed in the spirit of functionalist typology. This implies that, instead of formal criteria, cognitively motivated functional domains in terms of Givón (1984) are used as tertia comparationis. The purpose of CoMParS is to document the empirical basis of the theoretical assumptions of GDE-V and to illustrate the otherwise rather abstract content of grammar books by as many as possible naturally occurring and adequately presented multilingual examples, including information on their use in specific contexts and registers. These examples come from existing parallel corpora, and our presentation will focus on the legal aspects and consequences of this choice of language data.