Refine
Year of publication
Document Type
- Part of a Book (181)
- Article (166)
- Conference Proceeding (38)
- Review (5)
- Book (2)
- Other (2)
- Preprint (1)
Language
Keywords
- Deutsch (140)
- Korpus <Linguistik> (51)
- Konversationsanalyse (48)
- Interaktion (34)
- Semantik (23)
- Computerlinguistik (22)
- Kommunikation (21)
- Mehrsprachigkeit (21)
- Sprachpolitik (19)
- Englisch (18)
Publicationstate
- Postprint (395) (remove)
Reviewstate
- (Verlags)-Lektorat (176)
- Peer-Review (166)
- Peer-review (10)
- Verlags-Lektorat (5)
- Peer-Revied (2)
- (Verlags-)Lektorat (1)
- Review-Status-unbekannt (1)
- Zweitveröffentlichung (1)
Publisher
- Benjamins (52)
- Springer (36)
- Oxford University Press (19)
- Elsevier (14)
- Wilhelm Fink (14)
- Erich Schmidt (11)
- Buske (10)
- Winter (8)
- Equinox (6)
- Lang (6)
In spite of the obvious importance that is accorded to the notion grammatical construction in any approach that sees itself as a construction grammar (CxG), there is as yet no generally accepted definition of the term across different variants of the framework. In particular, there are different assumptions about which additional requirements a given structure has to meet in order to be recognized as a construction besides being a ‘form-meaning pair’. Since the choice of a particular definition will determine the range of both relevant phenomena and concrete observations to be considered in empirical research within the framework, the issue is not just a mere terminological quibble but has important methodological repercussions especially for quantitative research in areas such as corpus linguistics. The present study illustrates some problems in identifying and delimiting such patterns in naturally occurring text and presents arguments for a usage-based interpretation of the term grammatical construction.
In the context of the HyTex project, our goal is to convert a corpus into a hypertext, basing conversion strategies on annotations which explicitly mark up the text-grammatical structures and relations between text segments. Domain-specific knowledge is represented in the form of a knowledge net, using topic maps. We use XML as an interchange format. In this paper, we focus on a declarative rule language designed to express conversion strategies in terms of text-grammatical structures and hypertext results. The strategies can be formulated in a concise formal syntax which is independend of the markup, and which can be transformed automatically into executable program code.
Generierung von Linkangeboten zur Rekonstruktion terminologiebedingter Wissensvoraussetzungen
(2002)
Dieser Beitrag skizziert Strategien zur (semi-)automatischen Annotation von definitorischen Textsegmenten und Termverwendungsinstanzen auf der Grundlage grammatisch annotierter Korpora. Ziel unserer Überlegungen ist es, bei der selektiven Rezeption von Fachtexten in einer Hypertextumgebung die je spezifischen Wissensvoraussetzungen, die der Verwendung von Fachtermini unterliegen und die für das Textverständnis eine entscheidende Rolle spielen, über automatisch generierte Linkangebote rekonstruierbar zu machen.
In the management of cooperation, the fit of a requested action with what the addressee is presently doing is a pervasively relevant consideration. We present evidence that imperative turns are adapted to, and reflexively create, contexts in which the other person is committed to the course of action advanced by the imperative. This evidence comes from systematic variation in the design of imperative turns, relative to the fittedness of the imperatively mandated action to the addressee’s ongoing trajectory of actions, what we call the “dine of commitment”. We present four points on this dine: Responsive imperatives perform an operation on the deontic dimension of what the addressee has announced or already begun to do (in particular its permissibility); local-project imperatives formulate a new action advancing a course of action in which the addressee is already actively engaged; global-project-imperatives target a next task for which the addressee is available on the grounds of their participation in the overall event, and in the absence of any competing work; and competitive imperatives draw on a presently otherwise engaged addressee on the grounds of their social commitment to the relevant course of actions. These four turn shapes are increasingly complex, reflecting the interactional work required to bridge the increasing distance between what the addressee is currently doing, and what the imperative mandates. We present data from German and Polish informal and institutional settings.
The lexicography of German
(2020)
This chapter discusses the main dictionaries of the German language as it is spoken and written in Germany, and also German as it is spoken and written in Austria, Switzerland, the eastern fringes of Belgium, and South Tyrol. It also briefly describes Pennsylvania German. Corpora and other language resources used in German dictionary-making are also presented. Finally, there is a discussion of some current issues in German lexicography, as well as future prospects.
Within cognitive linguistics, there is an increasing awareness that the study of linguistic phenomena needs to be grounded in usage. Ideally, research in cognitive linguistics should be based on authentic language use, its results should be replicable, and its claims falsifiable. Consequently, more and more studies now turn to corpora as a source of data. While corpus-based methodologies have increased in sophistication, the use of corpus data is also associated with a number of unresolved problems. The study of cognition through off-line linguistic data is, arguably, indirect, even if such data fulfils desirable qualities such as being natural, representative and plentiful. Several topics in this context stand out as particularly pressing issues. This discussion note addresses (1) converging evidence from corpora and experimentation, (2) whether corpora mirror psychological reality, (3) the theoretical value of corpus linguistic studies of ‘alternations’, (4) the relation of corpus linguistics and grammaticality judgments, and, lastly, (5) the nature of explanations in cognitive corpus linguistics. We do not claim to resolve these issues nor to cover all possible angles; instead, we strongly encourage reactions and further discussion.
We present a method to identify and document a phenomenon on which there is very little empirical data: German phrasal compounds occurring in the form of as a single token (without punctuation between their components). Relying on linguistic criteria, our approach implies to have an operational notion of compounds which can be systematically applied as well as (web) corpora which are large and diverse enough to contain rarely seen phenomena. The method is based on word segmentation and morphological analysis, it takes advantage of a data-driven learning process. Our results show that coarse-grained identification of phrasal compounds is best performed with empirical data, whereas fine-grained detection could be improved with a combination of rule-based and frequency-based word lists. Along with the characteristics of web texts, the orthographic realizations seem to be linked to the degree of expressivity.
In this paper, I argue against the analyses of the there-construction by Moro (1997) and Hoekstra & Mulder (1990) and for an analysis in the frame of Williams (1994), Hazout (2004) from two angles. First of all, Moro and Hoekstra & Mulder do not correctly predict the behaviour of the there-construction under wh-movement; second, from a semantic point of view, the predicate in the small clause structure is the postverbal DP and not there. Alternatively, I follow the proposal by Williams (1994) in which there is the subject of predication and I will point out a direction to analyse the problematic wh-movement data within this framework.
Precise multimodal studies require precise synchronisation between audio and video signals. However, raw audio and audio from video recordings can be out of sync for several reasons. In order to re-synchronise them, a dynamic programming (DP) approach is presented here. Traditionally, DP is performed on the rectangular distance matrix comparing each value in signal A with each value in signal B. Previous work limited the search space using for example the Sakoe Chiba Band (Sakoe and Chiba, 1978). However, the overall space of the distance matrix remains identical. Here, a tunnel matrix and its according DP-algorithm are presented. The matrix contains merely the computed distance of two signals to a pre-specified bandwidth and the computational cost is equally reduced. An example implementation demonstrates the functionality on artificial data and on data from real audio and video recordings.
This article deals with three interrelated phenoma in the information structure of German sentences: the focusing of negative markers, of finite verb forms and of the particles ja, doch, wohl and schon. Focusing of the finite verb is the most important marker of verum focus, as described by Höhle (1988). Focusing of particles can be an alternative means for similar purposes, while focusing of negation seems to be the contradictory opposite of verum focus. It is shown that negation- independently of its information structural status - can be interpreted on three distinct levels of sentence meaning: as an indicator of the non-facticity of a state of affairs, the non-truth of a proposition, or the non-desirability of a speech act. Focusing of the negative marker puts contrastive emphasis on the negative value assigned to sentence meaning on one of these levels. Ve rum focus can be interpreted on the same three levels: as a marker of contrastive emphasis on a positive value of facticity, truth or desirability. The particles ja, doch, wohl and schon refer to sufficient epistemic or interactional conditions for the assignment of a positive or negative value. By focusing such a particle, the speaker indicates that (s)he believes the assigned value to be well justified and insists on establishing it as common ground for further interaction.
Thomas Müntzer zählt zu den Persönlichkeiten der deutschen Geschichte, die bislang völlig kontrovers beurteilt wurden. Das demonstriert schon die Auflistung der Bezeichnungen, mit denen Müntzer in der Literatur charakterisiert wurde und wird: Reformator, Theologe, Priester, Prediger, Prophet, Mystiker, Spiritualist, Apokalyptiker, (Sozial-)Revolutionär, Rebell, Bauernführer, Terrorist. In der DDR wurde er als Kämpfer für eine bessere, gerechte, von Ausbeutung freie Gesellschaft verehrt. Er galt als Vertreter eines christlichen Sozialismus. Sein Porträt war seit 1971 auf der 5-Mark- Banknote zu finden. Die religiösen Aspekte seines Wirkens wurden allerdings stark vernachlässigt. Die frühere Bundesrepublik sah in Müntzer eher einen religiösen Sonderling und nahm ihn nur am Rande zur Kenntnis.
Seine theologischen Auffassungen äußerte Müntzer in wenigen, meist recht kurzen, aber äußerst eindringlich formulierten Texten, deren Druck oft mit Problemen verbunden war. Bei den Zeitgenossen wirkte er vor allem über seine Predigten, die allerdings nicht publiziert wurden. Erhalten blieb dagegen ein nicht unbeträchtlicher Teil seines Briefwechsels, der für die Entwicklung seiner Ideen aufschlussreich ist. Im Prager Manifest, einem ungedruckt gebliebenen frühen Text, formulierte Müntzer erstmals in groben Zügen seine zentralen reformatorischen Vorstellungen.
HMMs are the dominating technique used in speech recognition today since they perform well in overall phone recognition. In this paper, we show the comparison of HMM methods and machine learning techniques, such as neural networks, decision trees and ensemble classifiers with boosting and bagging in the task of articulatory-acoustic feature classification. The experimental results show that HMM methods work well for the classification of such features as vocalic. However, decision tree and bagging outperform HMMs for the fricative classification task since the data skewness is much higher than for the feature vocalic classification task. This demonstrates that HMMs do not perform as well as decision trees and bagging in highly skewed data settings.
MRI data of German vowels and consonants was acquired for 9 speakers. In this paper tongue contours for the vowels were analyzed using the three-mode factor analysis technique PARAFAC. After some difficulties, probably related to what constitutes an adequate speaker sample for this three-mode technique to work, a stable two-factor solution was extracted that explained about 90% of the variance. Factor 1 roughly captured the dimension low back to high front; Factor 2 that from mid front to high back. These factors are compared with earlier models based on PARAFAC. These analyses were based on midsagittal contours; the paper concludes by illustrating from coronal and axial sections how non-midline information could be incorporated into this approach.
Analyses of jaw movement(obtained by Electromagnetic Articulography) and acoustics show that loud speech is an intricate phenomenon. Besides involving higher intensity and subglottal pressure it affects jaw movements as well as fundamental frequency and especially first formants. It is argued that all these effects serve the purpose of enhancing perceptual salience.
The vowel quality in some diphthongs of Swabian (an upper german dialect) was determined by measurement of first and second formant values. A minimal contrast could be shown between two different diphthong qualities […], where for Standard German only one is assumed, viz. /ai/. The two diphthong qualities differ only slightly in onset and offset vowel quality, so a better understanding of their relationship was expected from an examination of their dynamic aspects. Our preliminary results suggest that there is indeed a difference in the temporal structure of the two diphthongs.
The paper presents the results of a joint effort of a group of multimodality researchers and tool developers to improve the interoperability between several tools used for the annotation and analysis of multimodality. Each of the tools has specific strengths so that a variety of different tools, working on the same data, can be desirable for project work. However this usually requires tedious conversion between formats. We propose a common exchange format for multimodal annotation, based on the annotation graph (AG) formalism, which is supported by import and export routines in the respective tools. In the current version of this format the common denominator information can be reliably exchanged between the tools, and additional information can be stored in a standardized way.
The English language has taken advantage of the Digital Revolution to establish itself as the global language; however, only 28.6 %of Internet users speak English as their native language. Machine Trans-lation (MT) is a powerful technology that can bridge this gap. In devel-opment since the mid-20th century, MT has become available to every Internet user in the last decade, due to free online MT services. This paper aims to discuss the implications that these tools may have for the privacy of their users and how they are addressed by EU data protec-tion law. It examines the data-flows in respect of the initial processing (both from the perspective of the user and the MT service provider) and potential further processing that may be undertaken by the MT service provider.
This paper provides a unified semantic and discourse pragmatic analysis of the German particle nämlich, traditionally described as having a specificational and an explanative reading. Our claim is that nämlich is a discourse marker which signals that the expression it is attached to is a short (elliptic) answer to a salient implicit question about the previous utterance. We show how both the explanative and the specificational reading can be derived from this more general semantic contribution. In addition we discuss some cross linguistic consequences of our analysis.
A polarity-sensitive item (PSI), as traditionally defined, is an expression that is restricted to either an affirmative or negative context. PSIs like ‘lift a finger’ and ‘all the time in the world’ sub-serve discourse routines like understatement and emphasis. Lexical–semantic classes are increasingly invoked in descriptions of the properties of PSIs. Here, we use English corpus data and the tools of Frame Semantics (Fillmore, 1982, 1985) to explore Israel’s (2011) observation that the semantic role of a PSI determines how the expression fits into a contextually constructed scalar model. We focus on a class of exceptions implied by Israel’s model: cases in which a given PSI displays two countervailing patterns of polarity sensitivity, with attendant differences in scalar entailments. We offer a set of case studies of polaritysensitive expressions – including verbs of attraction and aversion like ‘can live without’, monetary units like ‘a red cent’, comparative adjectives and time-span adverbials – that demonstrate that the interpretation of a given PSI in a given polar context is based on multiple factors. These factors include the speaker’s perspective on and affective stance towards the described event, available inferences about causality and, perhaps most critically, particulars of the predication, including the verb or adjective’s frame membership, the presence or absence of an ability modal like can, the grammatical construction used and the range of contingencies evoked by the utterance.
When a noise verb is used to indicate verbal communication, factors from both the source domain of the verb (perception) and the target domain (communication) play a role in determining the argument structure of the sentence. While the target domain supplies a syntactic structure, the source domain’s semantics constrain the degree to which that syntactic structure can be exploited. This can be determined by comparing noise verbs in this use with manner-of-communication verbs, which are superficially similar, but native to communication. Data for these two classes of verbs were drawn from the British National Corpus. The data were annotated with frame-semantic markup, as described in the Berkeley FrameNet Project. We compared the presence, type of syntactic realization, and position of the semantically annotated arguments for both classes of verbs. We found that noise and manner verbs show statistically significant differences in these three areas. For instance, noise verbs are more focused on the form of the message than manner verbs: noise verbs appear more frequently with a quoted message. In addition, there are differences other than the complementation patterns: certain noise verbs are biased with respect to speakers’ genders, message types, and even orthography in quoted messages
We provide a unified account of semantic effects observable in attested examples of the German applicative (‘be-’) construction, e.g. Rollstuhlfahrer Poul Sehachsen aus Kopenhagen will den 1997 erschienenen Wegweiser Handiguide Europa fortführen und zusammen mit Movado Berlin berollen (‘Wheelchair user Poul Schacksen from Copenhagen wants to continue the guide ‘Handiguide Europe’, which came out in 1997, and roll Berlin together with Movado.’). We argue that these effects do not come from lexico-semantic operations on ‘input’ verbs, but are instead the products of a reconciliation procedure in which the meaning of the verb is integrated into the event-structure schema denoted by the applicative construction. We analyze the applicative pattern as an argument-structure construction, in terms of Goldberg (1995). We contrast this approach with that of Brinkmann (1997), in which properties associated with the applicative pattern (e.g. omissibility of the theme argument, holistic interpretation of the goal argument, and planar construal of the location argument) are attributed to general semantico-pragmatic principles. We undermine the generality of the principles as stated, and assert that these properties are instead construction-particular. We further argue that the constructional account provides an elegant model of the valence-creation and valence-augmentation functions of the prefix. We describe the constructional semantics as prototype-based: diverse implications of fee-predications, including iteration, transfer, affectedness, intensity and saturation, derive via regular patterns of semantic extension from the topological concept of coverage.
Wer über die aktuelle Entwicklung des Deutschen, über Sprachpflege und Sprach-politik in Deutschland spricht, muss unausweichlich auch über Englisch reden. Darin unterscheidet sich mein Bericht nicht von denen aus mehreren anderen europäischen Ländern. Meine Kapitel heißen Anglizismen, Domänenverslust, Sprachpolitik.
Contemporary studies on the characteristics of natural language benefit enormously from the increasing amount of linguistic corpora. Aside from text and speech corpora, corpora of computer-mediated communication (CMC) Position themselves between orality and literacy, and beyond that provide in- sight into the impact of "new", mainly intemet-based media on language beha- viour. In this paper, we present an empirical attempt to work with annotated CMC corpora for the explanation of linguistic phenomena. In concrete terms, we implement machine leaming algorithms to produce decision trees that reveal rules and tendencies about the use of genitive markers in German.
Discourse parsing of complex text types such as scientific research articles requires the analysis of an input document on linguistic and structural levels that go beyond traditionally employed lexical discourse markers. This chapter describes a text-technological approach to discourse parsing. Discourse parsing with the aim of providing a discourse structure is seen as the addition of a new annotation layer for input documents marked up on several linguistic annotation levels. The discourse parser generates discourse structures according to the Rhetorical Structure Theory. An overview of the knowledge sources and components for parsing scientific joumal articles is given. The parser’s core consists of cascaded applications of the GAP, a Generic Annotation Parser. Details of the chart parsing algorithm are provided, as well as a short evaluation in terms of comparisons with reference annotations from our corpus and with recently developed Systems with a similar task.
This chapter addresses the requirements and linguistic foundations of automatic relational discourse analysis of complex text types such as scientific journal articles. It is argued that besides lexical and grammatical discourse markers, which have traditionally been employed in discourse parsing, cues derived from the logical and generical document structure and the thematic structure of a text must be taken into account. An approach to modelling such types of linguistic information in terms of XML-based multi-layer annotations and to a text-technological representation of additional knowledge sources is presented. By means of quantitative and qualitative corpus analyses, cues and constraints for automatic discourse analysis can be derived. Furthermore, the proposed representations are used as the input sources for discourse parsing. A short overview of the projected parsing architecture is given.
Linguistic variation and linguistic virtuosity of young “Ghetto”-migrants in Mannheim, Germany
(2011)
In this paper, we provide an insight into the life world and social experiences of young Turkish migrants who are categorised by German society as “social problem cases”. Based on natural conversational data, we describe the communicative repertoire of one migrant adolescent and that of his friends. Our aims are (a) to isolate those linguistic features that convey the impression of “foreignness”, and stand out among other German speakers’ features, and (b) to analyse the variability in our informants’ discursive practices - i.e. code- or style-switching, as it is commonly referred to in the literature - in order to show how variation serves as a communicative resource. Our findings show that these adolescents’ remarkable linguistic proficiency and communicative competence contrast markedly to their low educational and professional status.
Les séquences argumentatives dans les discussions conflictuelles entre mères et filles adolescentes
(1994)
In Anlehnung an die Theorie der Individuation wird vermutet, dass das Gesprächsverhalten von Müttern und jugendlichen Töchtern in konfliktären Interaktionen durch Kontrolltendenzen auf Seiten der Mütter und Abgrenzungstendenzen auf Seiten der Töchter determiniert wird. Das Aufeinandertreffen konträrer partnerbezogener Intentionen sollte sich in bestimmten Regelmäßigkeiten bei der Abfolge von gesprächsorganisatorischen und argumentativen Äußerungen niederschlagen. Als Datenbasis dienten 60 Konfliktgespräche zwischen 30 Müttern und ihren jugendlichen Töchtern im Alter von 12 bis 24 Jahren. Jede Dyade diskutierte zwei aktuelle Konflikte. Die transkribierten Gespräche wurden nach einem Argumentations-Kategorien-System in Einheiten zerlegt und klassifiziert. Gesprächsorganisatorische Kategorien waren Initiativen sowie positive und negative Reaktionen auf Initiativen. Argumente wurden in ihrer Bezugnahme auf ein Ziel oder ein anderes Argument nach ihrer stärkenden, modifizierenden, schwächenden, zustimmenden oder ablehnenden Funktion unterschieden. Die Auswertungen erfolgten über lag-sequentielle Analysen. Als statistisch signifikante Muster resultierten das Repetionsphänomen (Initiativen werden vor allem von Müttern häufig wiederholt), die Argumentkonfrontation (Gegenargumente werden überzufällig häufig mit Gegenargumenten gekontert) und ein positiver sowie ein negativer Reaktionszyklus (sowohl positive als auch negative Reaktionen ziehen Partnerreaktionen der gleichen Qualität nach sich). Die Ergebnisse stehen in Einklang mit den entwicklungspsychologischen Annahmen über die partnerbezogenen Intentionen von Müttern und jugendlichen Töchtern.
Complex common names such as Indian elephant or green tea denote a certain type of entity, viz. kinds. Moreover, those kinds are always subkinds of the kind denoted by their head noun. Establishing such subkinds is essentially the task of classifying modifiers that are a defining trait of endocentrically structured complex common names. Examining complex common names of different lexico-syntactic types(NN compounds, N+N syntagmas, NP/PP syntagmas, A+N syntagmas) and from different languages (particularly English, German and French) it can be shown that complex common names are subject to language- independent formal and semantic constraints. In particular, complex common names qualify as name-like expressions in that they tend to be deficient in terms of formal complexity and semantic compositionality.
In this chapter, emotions are not regarded primarily as internal-psychological phenomena, but as socially proscribed and formed entities, which are constituted in accordance with social rules of emotionality and which are manifested, interpreted, and processed together communicatively in the interaction for definite purposes by the persons involved. In the elaboration of such an interactive conception of emotionality, the following aspects are treated: the value of emotionality in linguistic theories; emotions as a specific form of experiencing; the rules of emotionality; communication of emotions as transmission of evaluations; practices of manifestation, interpretation and processing of emotions in the communication process; fundamental interrelations between emotions and communication behavior; and methodology of the analysis of emotions and emotionality in specific conversation types. Finally, the developed theoretical apparatus in the analysis of two short conversation sections is elucidated.
Integrated Linguistic Annotation Models and Their Application in the Domain of Antecedent Detection
(2011)
Seamless integration of various, often heterogeneous linguistic resources in terms of their output formats and a combined analysis of the respective annotation layers are crucial tasks for linguistic research. After a decade of concentration on the development of formats to structure single annotations for specific linguistic issues, in the last years a variety of specifications to store multiple annotations over the same primary data has been developed. The paper focuses on the integration of the knowledge resource logical document structure information into a text document to enhance the task of automatic anaphora resolution both for the task of candidate detection and antecedent selection. The paper investigates data structures necessary for knowledge integration and retrieval.
Different Views on Markup
(2010)
In this chapter, two different ways of grouping information represented in document markup are examined: annotation levels, referring to conceptual levels of description, and annotation layers, referring to the technical realisation of markup using e.g. document grammars. In many current XML annotation projects, multiple levels are integrated into one layer, often leading to the problem of having to deal with overlapping hierarchies. As a solution, we propose a framework for XML-based multiple, independent XML annotation layers for one text, based on an abstract representation of XML documents with logical predicates. Two realisations of the abstract representation are presented, a Prolog fact base format together with an application architecture, and a specification for XML native databases. We conclude with a discussion of projects that have currently adopted this framework.
An approach to the unification of XML (Extensible Markup Language) documents with identical textual content and concurrent markup in the framework of XML-based multi-layer annotation is introduced. A Prolog program allows the possible relationships between element instances on two annotation layers that share PCDATA to be explored and also the computing of a target node hierarchy for a well-formed, merged XML document. Special attention is paid to identity conflicts between element instances, for which a default solution that takes into account metarelations that hold between element types on the different annotation layers is provided. In addition, rules can be specified by a user to prescribe how identity conflicts should be solved for certain element types.
This article introduces the topic of ‘‘Multilingual language resources and interoperability’’. We start with a taxonomy and parameters for classifying language resources. Later we provide examples and issues of interoperatability, and resource architectures to solve such issues. Finally we discuss aspects of linguistic formalisms and interoperability.
Researchers in many disciplines, sometimes working in close cooperation, have been concerned with modeling textual data in order to account for texts as the prime information unit of written communication. The list of disciplines includes computer science and linguistics as well as more specialized disciplines like computational linguistics and text technology. What many of these efforts have in common is the aim to model textual data by means of abstract data types or data structures that support at least the semi-automatic processing of texts in any area of written communication.
We report on finished work in a project that is concerned with providing methods, tools, best practice guidelines, and solutions for sustainable linguistic resources. The article discusses several general aspects of sustainability and introduces an approach to normalizing corpus data and metadata records. Moreover, the architecture of the sustainability platform implemented by the authors is described.
This article shows that the TEI tag set for feature structures can be adopted to represent a heterogeneous set of linguistic corpora. The majority of corpora is annotated using markup languages that are based on the Annotation Graph framework, the upcoming Linguistic Annotation Format ISO standard, or according to tag sets defined by or based upon the TEI guidelines. A unified representation comprises the separation of conceptually different annotation layers contained in the original corpus data (e.g. syntax, phonology, and semantics) into multiple XML files. These annotation layers are linked to each other implicitly by the identical textual content of all files. A suitable data structure for the representation of these annotations is a multi-rooted tree that again can be represented by the TEI and ISO tag set for feature structures. The mapping process and representational issues are discussed as well as the advantages and drawbacks associated with the use of the TEI tag set for feature structures as a storage and exchange format for linguistically annotated data.
We analyze the linguistic evolution of selected scientific disciplines over a 30-year time span (1970s to 2000s). Our focus is on four highly specialized disciplines at the boundaries of computer science that emerged during that time: computational linguistics, bioinformatics, digital construction, and microelectronics. Our analysis is driven by the question whether these disciplines develop a distinctive language use—both individually and collectively—over the given time period. The data set is the English Scientific Text Corpus (scitex), which includes texts from the 1970s/1980s and early 2000s. Our theoretical basis is register theory. In terms of methods, we combine corpus-based methods of feature extraction (various aggregated features [part-of-speech based], n-grams, lexico-grammatical patterns) and automatic text classification. The results of our research are directly relevant to the study of linguistic variation and languages for specific purposes (LSP) and have implications for various natural language processing (NLP) tasks, for example, authorship attribution, text mining, or training NLP tools.
Recently, a claim was made, on the basis of the German Google Books 1-gram corpus (Michel et al., Quantitative Analysis of Culture Using Millions of Digitized Books. Science 2010; 331: 176–82), that there was a linear relationship between six non-technical non-Nazi words and three ‘explicitly Nazi words’ in times of World War II (Caruana-Galizia. 2015. Politics and the German language: Testing Orwell’s hypothesis using the Google N-Gram corpus. Digital Scholarship in the Humanities [Online]. http://dsh.oxfordjournals.org/cgi/doi/10.1093/llc/fqv011 (accessed 15 April 2015)). Here, I try to show that apparent relationships like this are the result of misspecified models that do not take into account the temporal aspect of time-series data. The main point of this article is to demonstrate why such analyses run the risk of incorrect statistical inference, where potential effects are both meaningless and can potentially lead to wrong conclusions.
Using the Google Ngram Corpora for six different languages (including two varieties of English), a large-scale time series analysis is conducted. It is demonstrated that diachronic changes of the parameters of the Zipf–Mandelbrot law (and the parameter of the Zipf law, all estimated by maximum likelihood) can be used to quantify and visualize important aspects of linguistic change (as represented in the Google Ngram Corpora). The analysis also reveals that there are important cross-linguistic differences. It is argued that the Zipf–Mandelbrot parameters can be used as a first indicator of diachronic linguistic change, but more thorough analyses should make use of the full spectrum of different lexical, syntactical and stylometric measures to fully understand the factors that actually drive those changes.
Learning from Errors. Systematic Analysis of Complex Writing Errors for Improving Writing Technology
(2015)
In this paper, we describe ongoing research on writing errors with the ultimate goal to develop error-preventing editing functions in word-processors. Drawing from the state-of-the-art research in errors carried out in various fields, we propose the application of a general concept for action-slips as introduced by Norman. We demonstrate the feasibility of this approach by using a large corpus of writing errors in published texts. The concept of slips considers both the process and the product: some failure in a procedure results in an error in the product, i.e., is visible in the written text. In order to develop preventing functions, we need to determine causes of such visible errors.
The multiple gradations of German strong verbs are but manifestations of a rather uncomplicated system. There is a small number of ways to make up ablaut forms; these types of formation are identifiable in formal terms and, what is more, they have definite functions as morphological markers. Using classifications of stem forms according to quality, complexity and quantity of vowels, three types of operations involved in ablaut formation are identified. Ablaut always includes a change of quality type or a change of complexity type, and in addition it may include a change of quantity type. Ablaut forms are clearly distinguished as against bases (and against each other): their vocalism meets a defined standard of dissimilarity. On this basis, gradations are collected into inflectional classes that are defined in strictly synchronic terms. These classes continue the historical seven classes known from reference grammars. For the majority of strong verbs, membership in these classes (and thus ablaut) is predictable.
The paper gives an analysis of productively occurring dative constructions in German, attempting to unify what are known traditionally as Double Object and Experiencer Datives. The datives in question - cipients as we call them - are argued to be licensed under two conditions: One, predicates licensing cipients project a theme and a location argument internally; two, interpretation of the predication as a whole involves reference to two dissociated temporal intervals, or more generally, indexical truth intervals. It is argued that the location argument is needed because it provides the variable that is bound by the cipient argument - the variable in question ranges over superlocations of the location argument referent. Reference to two truth intervals is forced because interpreting the cipient structure involves evaluation of two propositional meanings that would contradict each other in a single context. The first propositional meaning is embedded in the predicate; it encodes that something is at a certain location (in quality space). The second propositional meaning is projected as a presupposition that corresponds just to the negation of the first one. The cipient, functioning as the logical subject of the construction, accommodates this second presuppositional meaning; this makes the construction as a whole interpretable. The analysis applies uniformly to what appear to be the two major contexts licensing cipients: ‘eventive’ and ‘foo-comparative’ predications, thereby accounting for some striking parallels between them.
As the nature of negative polarity items (NPIs) and their licensing contexts is still under much debate, a broad empirical basis is an important cornerstone to support further insights in this area of research. The work discussed in this paper is intended as a contribution to realizing this objective. The authors briefly introduce the phenomenon of NPIs and outline major theories about their licensing and also various licensing contexts before discussing our major topics: Firstly, a corpus-based retrieval method for NPI candidates is described that ranks the candidates according to their distributional dependence on the licensing contexts. Our method extracts single-word candidates and is extended to also capture multi-word candidates. The basic idea for automatically collecting NPI candidates from a large corpus is that an NPI behaves like a kind of collocate to its licensing contexts. Manual inspection and interpretation of the candidate lists identify the actual NPIs. Secondly, an online repository for NPIs and other items that show distributional idiosyncrasies is presented, which offers an empirical database for further (theoretical) research on these items in a sustainable way.
Lexicographic data are normally linked with each other in a complex manner. Especially, within the electronic lexicographic context, the following issues are addressed: How to encode these cross-reference structures so that both the lexicographers‘ editorial work with the linking-up is easy to handle and the options of the presentation are adequately flexible. The objective of this paper is to elucidate the presentation of an XML-modelling of cross-reference structures as part of a complete modelling concept. Thereby, the modelling potential of the XML-connected standard XLink and a new lexicographic concept will be brought together with cross-project guidelines for the modelling of link-structures.
Antonymy is a relation of lexical opposition which is generally considered to involve (i) the presence of a scale along which a particular property may be graded, and hence both (ii) gradability of the corresponding lexical items and (iii) typical entailment relations. Like other types of lexical opposites, antonyms typically differ only minimally: while denoting opposing poles on the relevant dimension of difference, they are similar with respect to other components of meaning. This paper presents examples of antonymy from the domain of speech act verbs which either lack some of these typical attributes or show problems in the application of these. It discusses several different proposals for the classification of these atypical examples.
Automatic recognition of speech, thought, and writing representation in German narrative texts
(2013)
This article presents the main results of a project, which explored ways to recognize and classify a narrative feature—speech, thought, and writing representation (ST&WR)—automatically, using surface information and methods of computational linguistics. The task was to detect and distinguish four types—direct, free indirect, indirect, and reported ST&WR—in a corpus of manually annotated German narrative texts. Rule-based as well as machine-learning methods were tested and compared. The results were best for recognizing direct ST&WR (best F1 score: 0.87), followed by indirect (0.71), reported (0.58), and finally free indirect ST&WR (0.40). The rule-based approach worked best for ST&WR types with clear patterns, like indirect and marked direct ST&WR, and often gave the most accurate results. Machine learning was most successful for types without clear indicators, like free indirect ST&WR, and proved more stable. When looking at the percentage of ST&WR in a text, the results of machine-learning methods always correlated best with the results of manual annotation. Creating a union or intersection of the results of the two approaches did not lead to striking improvements. A stricter definition of ST&WR, which excluded borderline cases, made the task harder and led to worse results for both approaches.
The project elexiko compiles an extensive, monolingual dictionary of Contemporary German. This contribution deals with the grammatical data in this dictionary; it is not only described how these are arranged content-wise depending on corpus data, but also how they were modelled.
Das Projekt elexiko erarbeitet ein umfangreiches, einsprachiges Wörterbuch des Gegenwartsdeutschen. In diesem Beitrag geht es um die grammatischen Angaben in diesem Wörterbuch; es wird nicht nur erläutert, wie diese inhaltlich in Abhängigkeit vom Prinzip der Korpusbasiertheit gestaltet sind, sondern auch, wie sie modelliert wurden.
Discourse segmentation is the division of a text into minimal discourse segments, which form the leaves in the trees that are used to represent discourse structures. A definition of elementary discourse segments in German is provided by adapting widely used segmentation principles for English minimal units, while considering punctuation, morphology, sytax, and aspects of the logical document structure of a complex text type, namely scientific articles. The algorithm and implementation of a discourse segmenter based on these principles is presented, as well an evaluation of test runs.
The present study examines the dynamics of the kanji combinations that form common (or general) and proper nouns in Japanese. The following three results were obtained. First, the degree of distribution results from two similar processes which are based on a steady-state of birth-and-death processes with different birth and death rates, resulting in a positive negative binomial distribution with the proper nouns and in a positive Waring distribution with common nouns. Second, all rank-frequency distributions follow the negative hypergeometric distribution used very frequently in ranking problems. Third, the building of kanji compounds follows a dissortative strategy. The higher the outdegree of a kanji, the more it prefers kanji with lower indegrees. A linear dependence can be observed with common nouns, whereas the relationship between compounded kanji is rather curvilinear with proper nouns. The actual analytical expression is not yet known.
This paper deals with the distribution of word length in short native mythological and historical Eskimo narrative texts. To my knowledge, no Eskimo‐Aleut data have been the object of quantitative linguistic investigation so far. Due to the strong linguistic and Stylistic homogeneity of the examined texts it was assumed that these texts can be subsumed under a single law of word length distribution, if word length distribution of a text is considered as a function of certain of its properties, such as author, language, and genre. So far, word length distribution in texts of a wide variety of languages and genres has been demonstrated to follow distributions of the compound Poisson family of discrete probability distributions. In view of the morphological idiosyncrasies of the Eskimo language in general, which are responsible for an unusually high mean word length of about 4.5 to 5.2 syllables per word in the texts, it is interesting to see whether Eskimo texts show a significantly different behaviour with respect to word length. The results demonstrate that the Eskimo data employed in this study can be fitted well by the Hyperpoisson distribution. Two further discrete probability distributions will be deduced from certain morphology‐based assumptions about Eskimo. It turns out that most of the Eskimo data can be fitted by these two distributions. The question to what extent these results point to a more grammar‐oriented theory of word length is also discussed.
In this paper we present an evaluation of rule-based morphological components for German for use in an interactive editing environment. The criteria for the evaluation are deduced from the intended use of these components, namely availability, performance, programming interfaces, and analysis quality. We evaluated systems developed and maintained since decades as well as new systems. However, we note serious general shortcomings when looking closer at recent implementations and come to the conclusion that the oldest system is the only one that satisfies our requirements.
In this paper the authors briefly outline editing functions which use methods from computational linguistics and take the structures of natural languages into consideration. Such functions could reduce errors and better support writers in realizing their communicative goals. However, linguistic methods have limits, and there are various aspects software developers have to take into account to avoid creating a solution looking for a problem: Language-aware functions could be powerful tools for writers, but writers must not be forced to adapt to their tools.
How to propose an action as an objective necessity. The case of Polish trzeba x (‘one needs to x’)
(2011)
The present study demonstrates that language-specific grammatical resources can afford speakers language-specific ways of organizing cooperative practical action. On the basis of video recordings of Polish families in their homes, we describe action affordances of the Polish impersonal modal declarative construction trzeba x (“one needs to x”) in the accomplishment of everyday domestic activities, such as cutting bread, bringing recalcitrant children back to the dinner table, or making phone calls. Trzeba-x turns in first position are regularly chosen by speakers to point to a possible action as an evident necessity for the furthering of some broader ongoing activity. Such turns in first position provide an environment in which recipients can enact shared responsibility by actively involving themselves in the relevant action. Treating the necessity as not restricted to any particular subject, aligning responsive actions are oriented to when the relevant action will be done, not whether it will be done. We show that such sequences are absent from English interactions by analyzing (a) grammatically similar turn formats in English interaction (“we need to x,” “the x needs to y”), and (b) similar interactive environments in English interactions. We discuss the potential of this research to point to a new avenue for researchers interested in the relationship between language diversity and diversity in human action and cognition.
The authors compare the use of two formats for requesting an object in informal everyday interaction: imperatives, common in our Polish data, and second-person polar questions, common in our English data. Imperatives and polar questions are selected in the same interactional “home environments” across the languages, in which they enact two social actions: drawing on shared responsibility and enlisting assistance, respectively. Speakers across the languages differ in their choice of request format in “mixed” interactional environments that support either. The finding shed light on the orderly ways in which cultural diversity is grounded in invariants of action formation.
‘Linguistic relativity’ has become a major keyword in debates on the psychological significance of language diversity. In this context, the term ‘relativity’ was originally taken on loan from Einstein’s then-recent theories by Edward Sapir (1924) and Benjamin L. Whorf (1940). The present paper assesses how far the idea of linguistic relativity does analogically build on relevant insights in modern physics, and fails to find any substantial analogies. The term was used rhetorically by Sapir and Whorf, and has since been incorporated into a cognitivist research programme that seeks to answer whether ‘language influences thought’. Contemporary research on ‘linguistic relativity’ has developed into a distinct way of studying language diversity, which shares a lot with the universalistic cognitivist framework it opposes, but little with relational approaches in science.
The authors establish a phenomenological perspective on the temporal constitution of experience and action. Retrospection and projection (i.e. backward as well as forward orientation of everyday action), sequentiality and the sequential organization of activities as well as simultaneity (i.e. participants’ simultaneous coordination) are introduced as key concepts of a temporalized approach to interaction. These concepts are used to capture that every action is produced as an inter-linked step in the succession of adjacent actions, being sensitive to the precise moment where it is produced. The adoption of a holistic, multimodal and praxeological perspective additionally shows that action in interaction is organized according to several temporal orders simultaneously in operation. Each multimodal resource used in interaction has its own temporal properties.
This paper shows how understanding in interaction is informed by temporality, and in particular, by the workings of retrospection. Understanding is a temporally extended, sequentially organized process. Temporality, namely, the sequential relationship of turn positions, equips participants with default mechanisms to display understandings and to expect such displays. These mechanisms require local management of turn-taking to be in order, i.e., the possibility and the expectation to respond locally and reciprocally to prior turns at talk. Sequential positions of turns in interaction provide an infrastructure for displaying understanding and accomplishing intersubjectivity. Linguistic practices specialized in displaying particular kinds of (not) understanding are adapted to the individual sequential positions with respect to an action-to-be-understood.
In Spoken Egyptian, the form of a linguistic sign is restricted by rules of root structure and consonant compatibility as well as word-formation patterns. Hieroglyphic Egyptian, however, displays additional principles of sign formation. Iconicity is one of the crucial features of a part of its sign inventory. In this article, hieroglyphic iconicity will be investigated by means of a preliminary comparative typology originally developed for German Sign Language (Kutscher 2010). The authors argue that patterns found in Egyptian hieroglyphic sign formation are systematically comparable to patterns of German Sign Language (DGS). These patterns determine what types of lexical meaning can be inferred from iconic linguistic signs.
This paper deals with the constructional variation of emotion predicates in Estonian. It gives an overview on the constructional types, including information of their quantitative distribution. It is shown that one characteristic of Estonian is the formation of pairs of converses, i.e. pairs of emotion verbs, which have the same emotion semantics but different argument realisation patterns. These converses are based on derivational morphology such as the causative morphem –ta ‘CAUS’. Causative derivation has been adduced in the theoretical literature as support for the assumption that the cross-linguistically wide-spread constructional variation in emotion predicates has its origin in a difference of the causal structure in the verbal semantics. This paper shows that the data of Estonian contradicts this assumption.
This paper analyses paramedic emergency interaction as multimodal multiactivity. Based on a corpus of video-recordings of emergency drills performed by professional paramedics during advanced training, the focus is on paramedics’ participation in multiple joint projects which become simultaneously relevant. Simultaneity and fast succession of multiactivity does not only characterise work on the team level, but also the work profile of the individual paramedic. Participants have to coordinate their own participation in more than one joint project intrapersonally. In the data studied, three patterns of allocating multimodal resources stood out as routine ways of coordinating participation in two simultaneous projects intrapersonally:
1. Talk and hearing vs. manual action monitored by gaze,
2. Talk and hearing vs. gazing (and pointing),
3. Manual action vs. gaze (and talk and hearing).
This article advocates an understanding of ‘positioning’ as a key to the analysis of identities in interaction within the methodological framework of conversation analysis. Building on research by Bamberg, Georgakopoulou and others, a performative, interaction-based approach to positioning is outlined and compared to membership categorization analysis. An interactional episode involving mock stories to reveal and reproach an inadequate identity-claim of a co-participant is analysed both in terms of practices of membership categorization and positioning. It is concluded that membership categorization is a core element of positioning. Still, positioning goes beyond membership categorization in a) revealing biographical dimensions accomplished by narration and b) by uncovering implicit performative claims of identity, which are not established by categorization or description.
"Standard language" is a contested concept, ideologically, empirically and theoretically. This is particularly true for a language such as German, where the standardization of the spoken language was based on the written standard and was established with respect to a communicative situation, i.e. public speech on stage (Bühnenaussprache), which most speakers never come across. As a consequence, the norms of the oral standard exhibit many features which are infrequent in the everyday speech even of educated speakers. This paper discusses ways to arrive at a more realistic conception of (spoken) standard German, which will be termed "standard usage". It must be founded on empirical observations of speakers linguistic choices in everyday situations. Arguments in favor of a corpus-based notion of standard have to consider sociolinguistic, political, and didactic concerns. We report on the design of a large study of linguistic variation conducted at the Institute for the German Language (project "Variation in Spoken German", Variation des gesprochenen Deutsch) with the aim of arriving at a representative picture of "standard usage" in contemporary German. It systematically takes into account both diatopic variation covering the multi-national space in which German an official language, and diastratic variation in terms of varying degrees of formality. Results of the study of phonetic and morphosyntactic variation are discussed. At least for German, a corpus-based notion of "standard usage" inevitably includes some degree of pluralism concerning areal variation, and it needs to do justice to register-based variation as well.
In dem Beitrag werden jüngste Entwicklungen auf dem Gebiet der Sprachpolitik, der Bildungspolitik und der Integrationspolitik in Deutschland dargestellt, die ein neues Verhältnis zur Mehrsprachigkeit erkennen lassen und die Schaffung zweisprachiger Bildungseinrichtungen ermöglichen. Dieser Beitrag wurde auch in einer englischen Version mit dem Titel "The political framework for creation and development of bilingual Kindergartens in Berlin" veröffentlicht. Sie ist über den Dokumentenserver des IDS zugänglich. Die deutsche Version des Beitrags trägt den Titel "Politische Rahmenbedingungen für zweisprachige Kindertagesstätten in Berlin". Sie ist nicht veröffentlicht, aber ebenfalls über den Dokumentenserver des IDS erhältlich.
In her overview, Margret Selting makes the case for the claim that dealing with authentic conversation necessarily lies at the heart of an interactionallinguistic approach to prosody (see Selting this volume, Section 3.3). However, collecting and transcribing corpora of authentic interaction is a time-consuming enterprise. This fact often severely restricts what the individual researcher is able to do in terms of analysis within the scope of his or her resources. Still, for dealing with many of the desiderata Margret Selting points out in Section 5 of her extensive overview, the use of larger corpora seems to be required. In this commenting paper, I want to argue that future progress in research on prosody in interaction will essentially rest on the availability and use of large public corpora. After reviewing arguments for and against the use of public corpora, I will discuss some upshots regarding corpus design and issues of transcription of public corpora.
One major issue in the accomplishment of contrasts in conversation is lexical choice of items which carry the semantic Ioad of the two states of affair which are represented as being opposed to one another. These items or expressions are co-selected to be understood as being contrastively related to each other. In this paper, it is argued that the activity of contrasting itself provides them with a specific local opposite meaning which they would not obtain in other contexts. Practices of contrastingare thus seen as an example of conversational activities which creatively and systematically affect situated meanings. Basedon data from various genres, such as meetings, mediation sessions and conversations, the paper discusses two practices of contrasting, their sequential construction and their interpretative effects. It is concluded that the interpretative effects of conversational contrasting rest on the sequential deployment oflinguistic resources and on the cognitive procedures of frame-based interpretation and constructing a maximally contrastive interpretation for the co-selected expressions.
Based on German speaking data from various activity types, the range of multimodal resources used to construct turn-beginnings is reviewed. It is claimed that participants in talk-in-interaction need to deal with four tasks in order to construct a turn which precisely fits the interactional moment of its production:
1. Achieve joint orientation: The accomplishment of the socio-spatial prerequisites necessary for producing a turn which is to become part of the participants’ common ground.
2. Display uptake: Next speaker needs to display his/her understanding of the interaction so far as the backdrop on which the production of the upcoming turn is based.
3. Deal with projections from prior talk: The speaker has to deal with projections which have been established by (the) previous turn(s) with respect to the upcoming turn.
4. Project properties of turn-in-progress: The speaker needs to orient the recipient to properties of the turn s/he is about to produce.
Turn-design thus can be seen to be informed by tasks related to the multimodal, embodied, and interactive contingencies of online-construction of turns. The four tasks are ordered in terms of prior tasks providing the prerequisite for accomplishing a later task.