Refine
Year of publication
- 2017 (98) (remove)
Document Type
- Conference Proceeding (41)
- Part of a Book (27)
- Article (25)
- Book (2)
- Working Paper (2)
- Other (1)
Language
- English (98) (remove)
Keywords
- Korpus <Linguistik> (37)
- Deutsch (20)
- Corpus linguistics (11)
- Computerlinguistik (9)
- Annotation (7)
- Corpus technology (6)
- Internet (6)
- Sprachstatistik (6)
- Texttechnologie (6)
- Englisch (5)
Publicationstate
- Veröffentlichungsversion (67)
- Postprint (14)
- Zweitveröffentlichung (8)
- Preprint (1)
Reviewstate
- Peer-Review (69)
- Peer-review (7)
- (Verlags)-Lektorat (6)
- Peer-Revied (2)
Publisher
In the management of cooperation, the fit of a requested action with what the addressee is presently doing is a pervasively relevant consideration. We present evidence that imperative turns are adapted to, and reflexively create, contexts in which the other person is committed to the course of action advanced by the imperative. This evidence comes from systematic variation in the design of imperative turns, relative to the fittedness of the imperatively mandated action to the addressee’s ongoing trajectory of actions, what we call the “dine of commitment”. We present four points on this dine: Responsive imperatives perform an operation on the deontic dimension of what the addressee has announced or already begun to do (in particular its permissibility); local-project imperatives formulate a new action advancing a course of action in which the addressee is already actively engaged; global-project-imperatives target a next task for which the addressee is available on the grounds of their participation in the overall event, and in the absence of any competing work; and competitive imperatives draw on a presently otherwise engaged addressee on the grounds of their social commitment to the relevant course of actions. These four turn shapes are increasingly complex, reflecting the interactional work required to bridge the increasing distance between what the addressee is currently doing, and what the imperative mandates. We present data from German and Polish informal and institutional settings.
Localism
(2017)
Localist hypothesis
(2017)
Corpus researchers, along with many other disciplines in science are being put under continual pressure to show accountability and reproducibility in their work. This is unsurprisingly difficult when the researcher is faced with a wide array of methods and tools through which to do their work; simply tracking the operations done can be problematic, especially when toolchains are often configured by the developers, but left largely as a black box to the user. Here we present a scheme for encoding this ‘meta data’ inside the corpus files themselves in a structured data format, along with a proof-of-concept tool to record the operations performed on a file.
This paper argues for using authentic data not only as an empirical basis for linguistic generalizations but also for exemplification purposes in monolingual and particularly in bi- and multilingual contrastive studies. It shows that parallel data extracted from the available parallel corpora can - after enrichment with semantic-functional information while maintaining the available contextual, register-related and linguistic information - serve as a perfect data source for multilingual exemplification. Moreover, the analysis of semantic-functionally equivalent parallel sequences allows the investigation and exemplification of similarities and differences in how different languages express similar meaning from both a semasiological and an onomasiological perspective.
Grammis is a web-based information system on German grammar, hosted by the Institute for the German Language (IDS). It is human-oriented and features different theoretical perspectives on grammar. Currently, the terminology component of grammis is being redesigned for this theoretical diversity to play a more prominent role in the data model. This also opens opportunities for implementing some machine-oriented features. In this paper, we present the re-design of both data model and knowledge base. We explore how the addition of machine-oriented features to the data model impacts the knowledge base; in particular, how this addition shifts some of the textual complexity into the data model. We show that our resource can easily be ported to a SKOS-XL representation, which makes it available for data science, knowledge-based NLP applications, and LOD in the context of digital humanities.
Unlike traditional text corpora collected from trustworthy sources, the content of web based corpora has to be filtered. This study briefly discusses the impact of web spam on corpus usability and emphasizes the importance of removing computer generated text from web corpora.
The paper also presents a keyword comparison of an unfiltered corpus with the same collection of texts cleaned by a supervised classifier trained using FastText. The classifier was able to recognize 71% of web spam documents similar to the training set but lacked both precision and recall when applied to short texts from another data set.
This paper discusses how cognitive aspects can be incorporated into lexicographic meaning descriptions based on corpus-driven analysis. The new German Online dictionary “Paronyme − Dynamisch im Kontrast” is concerned with easily confused words such as effektiv/effizient, sensibel/sensitiv. It is currently in the process of being developed and it aims at adopting a more conceptual and encyclopedic approach to meaning. Contrastive entries emphasize usage, comparing conceptual categories and indicating the mapping of knowledge. Adaptable access to lexicographic details offers different perspectives on information, and authentic examples reflect prototypical structures.
Some of the cognitive features are demonstrated with the help of examples. Firstly, I will outline how patterns of usage imply conceptual categories as central ideas instead of sufficiently logical criteria of semantic distinction. In this way, linguistic findings correlate better with how users conceptualize language. Secondly, it is pointed out how collocates are family members and fillers in contexts. Thirdly, I will demonstrate how contextual structure and function are included by summarizing referential information. Details are drawn from corpus data; they are usage-based patterns illustrating conversational interaction and semantic negotiation in contemporary public discourse. Finally, I will show flexible consultation routines where the focus on structural knowledge changes.
This paper discusses changes of lexicographic traditions with respect to approaches to meaning descriptions towards more cognitive perspectives. I will uncover how cognitive aspects can be incorporated into meaning descriptions based on corpus-driven analysis. The new German Online dictionary “Paronyme − Dynamisch im Kontrast” (Storjohann 2014; 2016) is concerned with easily confused words such as effektiv/effizient, sensibel/sensitiv. It is currently in the process of being developed and it aims at adopting a more conceptual and encyclopaedic approach to meaning by incorporating cognitive features. As a corpus-guided reference work it strives to adequately reflect ideas such as conceptual structure, categorisation and knowledge. Contrastive entries emphasise aspects of usage, comparing conceptual categories and indicate the (metonymic) mapping of knowledge. Adaptable access to lexicographic details and variable search options offer different foci and perspectives on linguistic information, and authentic examples reflect prototypical structures. Some of the cognitive features are demonstrated with the help of examples. Firstly, I will outline how patterns of usage imply conceptual categories as central ideas instead of sufficiently logical criteria of semantic distinction. In this way, linguistic findings correlate better with how users conceptualise language. Secondly, it is pointed out how collocates are treated as family members and fillers in contexts. Thirdly, I will demonstrate how contextual structure and functions are included summarising referential information. Details are drawn from corpus data, they are usage-based linguistic patterns illustrating conversational interaction and semantic negotiations in contemporary public discourse. Finally, I will outline consultation routines which activate different facets of structural knowledge, e.g. through changes of the ordering of information or through the visualisation of semantic networks.
Historical sociolinguistics in colonial New Guinea: The Rhenish mission society in the Astrolabe Bay
(2017)
The Rhenish Mission Society, a German Protestant mission, was active in a small part of northern New Guinea, the Astrolabe Bay, between 1887 and 1932. Up until 1914, this region was under German colonial rule. The German dominance was also reflected in rules on language use in official contexts such as schools and administration.
Missionaries were strongly affected by such rules as their most important tool in mission work was language. In addition, they were also responsible for school education as most schools in the German colonial areas in the Pacific were mission-run. Thus, mission societies had to make decisions about what languages to use, considering their own needs, their ideological convictions, and the colonial government’s requirements. These considerations were framed by the complex setting of New Guinea’s language wealth where several hundred languages were, and still are, spoken.
This paper investigates a small set of original documents from the Rhenish Mission Society to trace what steps were taken and what considerations played a major role in the process of agreeing on a suitable means of communication with the people the missionaries wanted to reach, thereby touching upon topics such as language attitudes, language policies and politics, practical considerations of language learning and language spread, and colonial actions impacting local language ecologies.
In my talk, I present an empirical approach to detecting and describing proverbs as frozen sentences with specific functions in current language use. We have developed this approach in the EU project ‘SprichWort’ (based on the German Reference Corpus). The first chapter illustrates selected aspects of our complex, iterative procedure to validate proverb candidates. Based on our corpus-driven lexpan methodology of slot analysis I then discuss semantic restrictions of proverb patterns. Furthermore, I show different degrees of proverb quality ranging from genuine proverbs to non-proverb realizations of the same abstract pattern. On the one hand, the corpus validation reveals that proverbs are definitely perceived and used as relatively fixed entities and often as sentences. On the other hand, proverbs are not only interpreted as an interesting unique phenomenon but also as part of the whole lexicon, embedded in networks of different lexical items.
This paper deals with the creation of the first morphological treebank for German by merging two pre-existing linguistic databases. The first of these is the linguistic database CELEX which is a standard resource for German morphology. We build on its refurbished and modernized version. The second resource is GermaNet, a lexical-semantic network which also provides partial markup for compounds. We describe the state of the art and the essential characteristics of both databases and our latest revisions. As the merging involves two data sources with distinct annotation schemes, the derivation of the morphological trees for the unified resource is not trivial. We discuss how we overcome problems with the data and format, in particular how we deal with overlaps and complementary scopes. The resulting database comprises about 100,000 trees whose format can be chosen according to the requirements of the application at hand. In our discussion, we show some future directions for morphological treebanks. The Perl script for the generation of the data from the sources will be made publicly available on our website.
Forms of committed relationships, including formal marriage arrangements between men and women, exist in almost every culture (Bell, 1997). Yet, similarly to many other psychological constructs (Henrich et al., 2010), marital satisfaction and its correlates have been investigated almost exclusively in Western countries (e.g., Bradbury et al., 2000). Meanwhile, marital relationships are heavily guided by culturally determined norms, customs, and expectations (for review see Berscheid, 1995; Fiske et al., 1998). While we acknowledge the differences existing both between- and within-cultures, we measured marital satisfaction and several factors that might potentially correlate with it based on self-report data from individuals across 33 countries. The purpose of this paper is to introduce the raw data available for anybody interested in further examining any relations between them and other country-level scores obtained elsewhere. Below, we review the central variables that are likely to be related to marital satisfaction.
The present paper examines the rise and fall of Modern High German loanwords in English from 1600 until 2000, principally making use of the record of borrowing documented by the Oxford English Dictionary (OED) in its Third Edition (online version, in revision 2000-). Groups of loanwords are analysed by century, with reference to the changing social and cultural landscape characterising relationships between the relevant nations over this period. This is not a simple picture: each language grows over the period in different ways, and the speakers of English look to German at different times for different types of borrowing, as the political and intellectual balance alters.
Introduction
(2017)
We present a major step towards the creation of the first high-coverage lexicon of polarity shifters. In this work, we bootstrap a lexicon of verbs by exploiting various linguistic features. Polarity shifters, such as ‘abandon’, are similar to negations (e.g. ‘not’) in that they move the polarity of a phrase towards its inverse, as in ‘abandon all hope’. While there exist lists of negation words, creating comprehensive lists of polarity shifters is far more challenging due to their sheer number. On a sample of manually annotated verbs we examine a variety of linguistic features for this task. Then we build a supervised classifier to increase coverage. We show that this approach drastically reduces the annotation effort while ensuring a high-precision lexicon. We also show that our acquired knowledge of verbal polarity shifters improves phrase-level sentiment analysis.
We present an approach to making existing CLARIN web services usable for spoken language transcriptions. Our approach is based on a new TEI-based ISO standard for such transcriptions. We show how existing tool formats can be transformed to this standard, how an encoder/decoder pair for the TCF format enables users to feed this type of data through a WebLicht tool chain, and why and how web services operating directly on the standard format would be useful.