Korpuslinguistik
Refine
Document Type
- Part of a Book (8)
- Article (3)
- Book (2)
- Conference Proceeding (2)
Has Fulltext
- yes (15)
Keywords
- Korpus <Linguistik> (11)
- Deutsch (5)
- Grammatik (4)
- Automatische Sprachanalyse (2)
- Computerlinguistik (2)
- Information Retrieval (2)
- Korpuslinguistik (2)
- Lyrics <Lyrik> (2)
- Popmusik (2)
- corpus linguistics (2)
Publicationstate
- Veröffentlichungsversion (8)
- Zweitveröffentlichung (6)
- Postprint (2)
Reviewstate
- (Verlags)-Lektorat (10)
- Peer-Review (5)
Publisher
- Narr Francke Attempto (3)
- Narr (2)
- de Gruyter (2)
- European Language Resources Association (1)
- European language resources association (ELRA) (1)
- Friedrich H. (1)
- GSCL (1)
- Köllen (1)
- Springer (1)
- Universität Hamburg (1)
Song lyrics can be considered as a text genre that has features of both written and spoken discourse, and potentially provides extensive linguistic and cultural information to scientists from various disciplines. However, pop songs play a rather subordinate role in empirical language research so far - most likely due to the absence of scientifically valid and sustainable resources. The present paper introduces a multiply annotated corpus of German lyrics as a publicly available basis for multidisciplinary research. The resource contains three types of data for the investigation and evaluation of quite distinct phenomena: TEI-compliant song lyrics as primary data, linguistically and literary motivated annotations, and extralinguistic metadata. It promotes empirically/statistically grounded analyses of genre-specific features, systemic-structural correlations and tendencies in the texts of contemporary pop music. The corpus has been stratified into thematic and author-specific archives; the paper presents some basic descriptive statistics, as well as the public online frontend with its built-in evaluation forms and live visualisations.
Linguistic query systems are special purpose IR applications. We present a novel state-of-the-art approach for the efficient exploitation of very large linguistic corpora, combining the advantages of relational database management systems (RDBMS) with the functional MapReduce programming model. Our implementation uses the German DEREKO reference corpus with multi-layer linguistic annotations and several types of text-specific metadata, but the proposed strategy is language-independent and adaptable to large-scale multilingual corpora.
Vorgestellt wird das Korpus deutschsprachiger Songtexte als innovative Sprachdatenquelle für interdisziplinäre Untersuchungsszenarien und speziell für den Einsatz im Fremd- und Zweitsprachenunterricht. Die Ressource dokumentiert Eigenschaften konzeptioneller Schriftlichkeit und konzeptioneller Mündlichkeit und erlaubt empirisch begründete Analysen sprachlicher Phänomene bzw. Tendenzen in den Texten moderner Popmusik. Vorgestellt werden Design, Annotationen und Anwendungsbeispiele des in thematische und autorenspezifische Archive stratifizierten Korpus.
Das Vokabular von Songtexten im gesellschaftlichen Kontext – ein diachron-empirischer Beitrag
(2022)
Der Beitrag untersucht den Stellenwert gesellschaftlich relevanter Thematiken in deutschsprachigen Songtexten der zurückliegenden fünf Jahrzehnte. Dabei zeigt sich, dass neben individuellen Befindlichkeiten auch politische, sozialkritische oder umweltbezogene Themen signifikant angesprochen werden. Wir kontrastieren Songtexte mit anderen Testsorten und wenden dabei quantitative Methoden auf umfangreiche, breit stratifizierte Datensamples an, um die Phänomenbeschreibungen präzisierbar, generalisierbar und reproduzierbar zu machen. Das longitudinale Korpusdesign bietet Potenzial für diachrone Vergleiche. Im Sinne eines erweiterten „Mixed Methods“-Ansatzes exploriert die Studie zudem ausgewählte Aspekte qualitativ und bettet sie in den zeitlichen Kontext ein.
Contemporary studies on the characteristics of natural language benefit enormously from the increasing amount of linguistic corpora. Aside from text and speech corpora, corpora of computer-mediated communication (CMC) Position themselves between orality and literacy, and beyond that provide in- sight into the impact of "new", mainly intemet-based media on language beha- viour. In this paper, we present an empirical attempt to work with annotated CMC corpora for the explanation of linguistic phenomena. In concrete terms, we implement machine leaming algorithms to produce decision trees that reveal rules and tendencies about the use of genitive markers in German.
Editorial
(2013)
Einleitung
(2023)
The paper describes preliminary studies regarding the usage of Example-Based Querying for specialist corpora. We outline an infrastructure for its application within the linguistic domain. Example-Based Querying deals with retrieval situations where users would like to explore large collections of specialist texts semantically, but are unable to explicitly name the linguistic phenomenon they look for. As a way out, the proposed framework allows them to input prototypical everyday language examples or cases of doubt, which are automatically processed by CRF and linked to appropriate linguistic texts in the corpus.
Linguistic query systems are special purpose IR applications. We present a novel state-of-the-art approach for the efficient exploitation of very large linguistic corpora, combining the advantages of relational database management systems (RDBMS) with the functional MapReduce programming model. Our implementation uses the German DEREKO reference corpus with multi-layer
linguistic annotations and several types of text-specific metadata, but the proposed strategy is language-independent and adaptable to large-scale multilingual corpora.