Refine
Year of publication
- 2013 (22) (remove)
Document Type
- Part of a Book (8)
- Article (6)
- Conference Proceeding (6)
- Book (1)
- Part of Periodical (1)
Has Fulltext
- yes (22)
Keywords
- Korpus <Linguistik> (22) (remove)
Publicationstate
Reviewstate
- (Verlags)-Lektorat (9)
- Peer-Review (4)
- Verlags-Lektorat (1)
- Zweitveröffentlichung (1)
Publisher
- Narr (3)
- GSCL (2)
- UCREL (2)
- ACM (1)
- Association for Computational Linguistics (1)
- Gesellschaft für Sprachtechnologie und Computerlinguistik (1)
- Hempen (1)
- Institut für Deutsche Sprache (1)
- Köllen (1)
- Lang (1)
"Webkorpora in Computerlinguistik und Sprachforschung" war das Thema eines Workshops,der von den beiden GSCL-Arbeitskreisen „Hypermedia“ und „Korpuslinguistik“ am Institut für Deutsche Sprache (IDS) in Mannheim veranstaltet wurde, und zu dem sich am 27.09. und 28.09.2012 Experten aus universitären und außeruniversitären Forschungseinrichtungen zu Vorträgen und Diskussionen zusammenfanden. Der facettenreiche Workshop thematisierte Fragen der Gewinnung, der Aufbereitung und der Analyse von Webkorpora für computerlinguistische Anwendungen und sprachwissenschaftliche Forschung. Einen Schwerpunkt bildeten dabei die speziellen Anforderungen, die sich gerade im Hinblick auf deutschsprachige Ressourcen ergeben. Im Fokus stand weiterhin die Nutzung von Webkorpora für die empirisch gestützte Sprachforschung, beispielsweise als Basis für sprachstatistische Analysen, für Untersuchungen zur Sprachlichkeit in der internetbasierten Kommunikation oder für die korpusgestützte Lexikographie. Zusätzlich gab es eine Poster/Demosession, in der wissenschaftliche und kommerzielle Projekte ihre Forschungswerkzeuge und Methoden vorstellen konnten.
We investigate the task of detecting reliable statements about food-health relationships from natural language texts. For that purpose, we created a specially annotated web corpus from forum entries discussing the healthiness of certain food items. We examine a set of task-specific features (mostly) based on linguistic insights that are instrumental in finding utterances that are commonly perceived as reliable. These features are incorporated in a supervised classifier and compared against standard features that are widely used for various tasks in natural language processing, such as bag of words, part-of speech and syntactic parse information.
Extending the possibilities for collaborative work with TEI/XML through the usage of a wiki system
(2013)
This paper presents and discusses an integrated project-specific working environment for editing TEI/XML-files and linking entities of interest to a dedicated wiki system. This working environment has been specifically tailored to the workflow in our interdisciplinary digital humanities project GeoBib. It addresses some challenges that arose while working with person-related data and geographical references in a growing collection of TEI/XML-files. While our current solution provides some essential benefits, we also discuss several critical issues and challenges that remain.
The goal of the present chapter is to explore the possibility of providing the research (but also the industrial) community that commonly uses spoken corpora with a stable portfolio of well-documented standardized formats that allow a high reuse rate of annotated spoken resources and, as a consequence, better interoperability across tools used to produce or exploit such resources.
Linguistic query systems are special purpose IR applications. We present a novel state-of-the-art approach for the efficient exploitation of very large linguistic corpora, combining the advantages of relational database management systems (RDBMS) with the functional MapReduce programming model. Our implementation uses the German DEREKO reference corpus with multi-layer
linguistic annotations and several types of text-specific metadata, but the proposed strategy is language-independent and adaptable to large-scale multilingual corpora.