Refine
Year of publication
- 2012 (272) (remove)
Document Type
- Part of a Book (120)
- Article (82)
- Conference Proceeding (35)
- Book (19)
- Part of Periodical (11)
- Doctoral Thesis (2)
- Other (2)
- Review (1)
Keywords
- Deutsch (118)
- Korpus <Linguistik> (28)
- Konversationsanalyse (19)
- Computerlinguistik (16)
- Englisch (11)
- Sprachgebrauch (11)
- Interaktion (10)
- Kontrastive Grammatik (10)
- Deutschland (9)
- Diskursanalyse (9)
Publicationstate
- Veröffentlichungsversion (102)
- Zweitveröffentlichung (23)
- Postprint (15)
Reviewstate
Publisher
- de Gruyter (37)
- Institut für Deutsche Sprache (31)
- Narr (17)
- European Language Resources Association (8)
- Lang (8)
- De Gruyter (7)
- European Language Resources Association (ELRA) (5)
- Verl. für Gesprächsforschung (5)
- Akademie Verlag (4)
- Springer (4)
The ISOcat registry reloaded
(2012)
The linguistics community is building a metadata-based infrastructure for the description of its research data and tools. At its core is the ISOcat registry, a collaborative platform to hold a (to be standardized) set of data categories (i.e., field descriptors). Descriptors have definitions in natural language and little explicit interrelations. With the registry growing to many hundred entries, authored by many, it is becoming increasingly apparent that the rather informal definitions and their glossary-like design make it hard for users to grasp, exploit and manage the registry’s content. In this paper, we take a large subset of the ISOcat term set and reconstruct from it a tree structure following the footsteps of schema.org. Our ontological re-engineering yields a representation that gives users a hierarchical view of linguistic, metadata-related terminology. The new representation adds to the precision of all definitions by making explicit information which is only implicitly given in the ISOcat registry. It also helps uncovering and addressing potential inconsistencies in term definitions as well as gaps and redundancies in the overall ISOcat term set. The new representation can serve as a complement to the existing ISOcat model, providing additional support for authors and users in browsing, (re-)using, maintaining, and further extending the community’s terminological metadata repertoire.
Creating and maintaining metadata for various kinds of resources requires appropriate tools to assist the user. The paper presents the metadata editor ProFormA for the creation and editing of CMDI (Component Metadata Infrastructure) metadata in web forms. This editor supports a number of CMDI profiles currently being provided for different types of resources. Since the editor is based on XForms and server-side processing, users can create and modify CMDI files in their standard browser without the need for further processing. Large parts of ProFormA are implemented as web services in order to reuse them in other contexts and programs.
This paper presents the system architecture as well as the underlying workflow of the Extensible Repository System of Digital Objects (ERDO) which has been developed for the sustainable archiving of language resources within the Tübingen CLARIN-D project. In contrast to other approaches focusing on archiving experts, the described workflow can be used by researchers without required knowledge in the field of long-term storage for transferring data from their local file systems into a persistent repository.
The paper’s purpose is to give an overview of the work on the Component Metadata Infrastructure (CMDI) that was implemented in the CLARIN research infrastructure. It explains, the underlying schema, the accompanying tools and services. It also describes the status and impact of the CMDI developments done within the CLARIN project and past and future collaborations with other projects.
The Component Metadata Infrastructure (CMDI) in a project on sustainable linguistic resources
(2012)
The sustainable archiving of research data for predefined time spans has become increasingly important to researchers and is stipulated by funding organizations with the obligatory task of being observed by researchers. An important aspect in view of such a sustainable archiving of language resources is the creation of metadata, which can be used for describing, finding and citing resources. In the present paper, these aspects are dealt with from the perspectives of two projects: the German project for Sustainability of Linguistic Data at the University of Tubingen (NaLiDa, cf. http://www.sfs.uni-tuebingen.de/nalida) and the Dutch-Flemish HLT Agency hosted at the Institute for Dutch Lexicology (TST-Centrale, cf.http://www.inl.nl/tst-centrale). Both projects unfold their approaches to the creation of components and profiles using the Component Metadata Infrastructure (CMDI) as underlying metadata schema for resource descriptions, highlighting their experiences as well as advantages and disadvantages in using CMDI.
This paper describes the status of the standardization efforts of a Component Metadata approach for describing Language Resources with metadata. Different linguistic and Language & Technology communities as CLARIN, META-SHARE and NaLiDa use this component approach and see its standardization of as a matter for cooperation that has the possibility to create a large interoperable domain of joint metadata. Starting with an overview of the component metadata approach together with the related semantic interoperability tools and services as the ISOcat data category registry and the relation registry we explain the standardization plan and efforts for component metadata within ISO TC37/SC4. Finally, we present information about uptake and plans of the use of component metadata within the three mentioned linguistic and L&T communities.
In two eye-tracking experiments, we investigated the relationship between the subject preference in the resolution of subject-object ambiguities in German embedded clauses and semantic word order constraints (i.e., prominence hierarchies relating to the specificity/referentiality of noun phrases, case assignment and thematic role assignment). Our central research question concerned the timecourse with which prominence information is used and particularly whether it modulates the subject preference. In both experiments, we replicated previous findings of reanalysis effects for object-initial structures. Our findings further suggest that noun phrase prominence does not alter initial parsing strategies (viz., the subject preference), but rather modulates the ease of later reanalysis processes. In Experiment 1, the object case assigned by the verb did not affect the ease of reanalysis. However, the syntactic reanalysis was rendered more difficult when the order of the two arguments violated the specificity/referentiality hierarchy. Experiment 2 revealed that the initial subject preference also holds for verbs favoring an object-initial base order (i.e., dative object-experiencer verbs). However, the advantage for subject-initial sentences is neutralized in relatively late processing stages when the thematic role hierarchy and the specificity hierarchy converge to promote scrambling.
„XYZ hat dich angestupst". Romantische Erstkontakte bei Facebook - ein Schnittstellenphänomen?
(2012)
Am Kontaktaufnahmeverhalten in Sozialen Netzwerken - so die These des vorliegenden Aufsatzes - kann nachvollzogen werden, wie kommunikative Verhaltensformen in romantischen Kontexten aus On- und Offline-Welt Zusammenwirken und einander ergänzen. Anders als Online-Kontaktbörsen dienen Soziale Netzwerke in erster Linie der Pflege bereits offline bestehender sozialer Kontakte. Dennoch werden sie auch genutzt, um neue Kontakte zu etablieren, und als eine virtuelle Erweiterung einer Offline-Lebenswelt begriffen, in der fremde, aber als attraktiv kategorisierte Profilidentitäten' kontaktiert werden können. Mit (sprachlichen) Strategien wird einerseits das für Offline-Situationen typische Flirtverhalten simuliert, andererseits aber auf das charakteristische Vorgehen in Online-Kontaktbörsen zurückgegriffen. Auf der Basis solcher Beobachtungen werden Soziale Netzwerke als neuer Kommunikationsraum gedeutet, in dem Online- und Offline-Welt diffundieren - eine These, die aufschlussreich ist für eine Theorie kirchlicher Praxis in den Kommunikationsräumen des Web 2.0.
Numerus
(2012)
Corpora with high-quality linguistic annotations are an essential component in many NLP applications and a valuable resource for linguistic research. For obtaining these annotations, a large amount of manual effort is needed, making the creation of these resources time-consuming and costly. One attempt to speed up the annotation process is to use supervised machine-learning systems to automatically assign (possibly erroneous) labels to the data and ask human annotators to correct them where necessary. However, it is not clear to what extent these automatic pre-annotations are successful in reducing human annotation effort, and what impact they have on the quality of the resulting resource. In this article, we present the results of an experiment in which we assess the usefulness of partial semi-automatic annotation for frame labeling. We investigate the impact of automatic pre-annotation of differing quality on annotation time, consistency and accuracy. While we found no conclusive evidence that it can speed up human annotation, we found that automatic pre-annotation does increase its overall quality.
Zur Erforschung der generationsbedingten Variation im pfälzischen Sprachinseldialekt am Niederrhein
(2012)
This article discusses questions concerning the creation, annotation and sharing of spoken language corpora. We use the Hamburg Map Task Corpus (HAMATAC), a small corpus in which advanced learners of German were recorded solving a map task, as an example to illustrate our main points. We first give an overview of the corpus creation and annotation process including recording, metadata documentation, transcription and semi-automatic annotation of the data. We then discuss the manual annotation of disfluencies as an example case in which many of the typical and challenging problems for data reuse – in particular the reliability of interpretative annotations – are revealed.
Einstellungen und Meinungen prägen das menschliche Handeln; auch die Sprache, die einen zentralen Anker der menschlichen Identität bildet, ist davon betroffen. Der vorliegende Band präsentiert die Ergebnisse eines interdisziplinären Forschungsprojekts zu aktuellen Spracheinstellungen in Deutschland aus sprachwissenschaftlicher und aus sozialpsychologischer Sicht. Mentale Konzepte von Dialekten werden dabei ebenso besprochen wie Bewertungen von Deutsch und anderen Sprachen, Stereotype und Eigen- und Fremdbewertungen.
Des Weiteren wird in einer Sprachstandserhebung die Stellung der deutschen Sprache in Deutschland in der Zusammenschau mehrerer einschlägiger Daten und Statistiken, etwa zur Stellung des Deutschen an Schulen und Hochschulen oder zu deutschsprachigen Medien, dokumentiert.
Der Band bietet damit die bislang erste umfassende Darstellung von Einstellungen zum Deutschen, zu Varietäten des Deutschen, zu anderen Sprachen und zu Sprechern dieser Sprachen und Varietäten.
Knowledge Acquisition with Natural Language Processing in the Food Domain: Potential and Challenges
(2012)
In this paper, we present an outlook on the effectiveness of natural language processing (NLP) in extracting knowledge for the food domain. We identify potential scenarios that we think are particularly suitable for NLP techniques. As a source for extracting knowledge we will highlight the benefits of textual content from social media. Typical methods that we think would be suitable will be discussed. We will also address potential problems and limits that the application of NLP methods may yield.
In this paper, we examine methods to extract different domain-specific relations from the food domain. We employ different extraction methods ranging from surface patterns to co-occurrence measures applied on different parts of a document. We show that the effectiveness of a particular method depends very much on the relation type considered and that there is no single method that works equally well for every relation type. As we need to process a large amount of unlabeled data our methods only require a low level of linguistic processing. This has also the advantage that these methods can provide responses in real time.
In this paper, we describe MLSA, a publicly available multi-layered reference corpus for German-language sentiment analysis. The construction of the corpus is based on the manual annotation of 270 German-language sentences considering three different layers of granularity. The sentence-layer annotation, as the most coarse-grained annotation, focuses on aspects of objectivity, subjectivity and the overall polarity of the respective sentences. Layer 2 is concerned with polarity on the word- and phrase-level, annotating both subjective and factual language. The annotations on Layer 3 focus on the expression-level, denoting frames of private states such as objective and direct speech events. These three layers and their respective annotations are intended to be fully independent of each other. At the same time, exploring for and discovering interactions that may exist between different layers should also be possible. The reliability of the respective annotations was assessed using the average pairwise agreement and Fleiss’ multi-rater measures. We believe that MLSA is a beneficial resource for sentiment analysis research, algorithms and applications that focus on the German language.
We present a gold standard for semantic relation extraction in the food domain for German. The relation types that we address are motivated by scenarios for which IT applications present a commercial potential, such as virtual customer advice in which a virtual agent assists a customer in a supermarket in finding those products that satisfy their needs best. Moreover, we focus on those relation types that can be extracted from natural language text corpora, ideally content from the internet, such as web forums, that are easy to retrieve. A typical relation type that meets these requirements are pairs of food items that are usually consumed together. Such a relation type could be used by a virtual agent to suggest additional products available in a shop that would potentially complement the items a customer has already in their shopping cart. Our gold standard comprises structural data, i.e. relation tables, which encode relation instances. These tables are vital in order to evaluate natural language processing systems that extract those relations.