OPUS 4 | Search

Neologie und Korpus (1998)

Das in der Germanistik lange vernachlässigte Thema der Neologie und des lexikalischen Wandels wird in theoretischen, methodologischen und praktischen Aspekten beleuchtet. Es wird gezeigt, welchen Beitrag die Korpuslinguistik bei der Objektivierung des Bedeutungswechsels bereits vorhandener lexikalischer Ausdrücke leisten kann und welche Relevanzkriterien für die lexikographische Bearbeitung erfüllt sein müssen.

AND-Type versus WITH-Type Conjunctions: Towards a Corpus-Based Study (2012)

Trawiński, Beata

Grammatical shifts in English-German noun phrases (2012)

Hansen-Morath, Sandra ; Hansen-Schirra, Silvia

Introduction (2012)

Hansen-Morath, Sandra ; Schwarz, Christian ; Stoeckle, Philipp ; Streck, Tobias

Verwaltungssprache in Erpresserbriefen (2009)

Hansen, Sandra

Erpresserbriefe werden häufig mit elliptischen Formulierungen verbunden, welche durch ausgeschnittene, auf einem Stück Papier aufgeklebte Buchstaben realisiert werden. Betrachtet man allerdings authentische Erpresserbriefe, stellt man fest, dass viele wie ein Geschäftsbrief aussehen und verwaltungssprachliche Elemente aufweisen. Welche Formen der Verwaltungssprache sind das und warum werden diese in Schreiben illegalen Charakters verwendet? Der vorliegende Beitrag befasst sich mit diesen Fragestellungen. Anhand einer Stichprobe aus der Tatschreibensammlung des BKA werden Formen der Verwaltungssprache in Erpresserbriefen empirisch untersucht, Erklärungsansätze entwickelt und deren Relevanz für die Autorenerkennung erläutert.

Kontrastives Verbvalenzwörterbuch Spanisch - Deutsch (2010)

Domínguez Vázquez, María Jose ; Engel, Ulrich ; Lübke, Barbara ; Meliss, Meike ; Mirazo Balsa, Mónica ; Paredes Suárez, Gemma ; Pastor, Alejandro ; López, Martina Silva ; Sloth Poulsen, Pia ; Rozas, Victoria Vázquez

Das kontrastive Verbvalenzwörterbuch Spanisch - Deutsch (Diccionario contrastivo de valencias verbales español - alemán DCVVEA) liefert eine Beschreibung der kombinatorischen Möglichkeiten von über hundert hochfrequenten Verben des Spanischen und ihrer deutschen Äquivalente und macht präzise Angaben zu ihren semantischen und syntagmatischen Eigenschaften. Die Abgrenzung von Bedeutungsvarianten für die polysemen spanischen Lemmata geht zum einen von vorliegenden lexikographische Beschreibungen aus, die an die Zielsetzungen des DCVVEA angepasst wurden, und stützt sich zum anderen auf das empirische Datenmaterial, das die syntaktische Datenbank Base de datos sintácticos del español actual (BDS) zur Verfügung stellt. Die BDS wurde von WissenschaftlerInnen der USC unter der Leitung von Guillermo Rojo erstellt und enthält die Ergebnisse der syntaktischen Analyse von etwa 160.000 Sätzen aus einem Textkorpus der spanischen Gegenwartssprache, ARTHUS (Archivo de textos hispánicos de la Universidad de Santiago de Compostela). Das DCVVEA ist ein syntagmatisches Wörterbuch mit alphabetischer Struktur und Spanisch als Metasprache. Die Einträge beziehen sich auf die einzelnen Bedeutungsvarianten eines spanischen Verbs und werden mit authentischen Beispielen belegt. Den spanischen Verbvarianten werden deutsche Verben zugeordnet, die zu ihnen in einer vollständigen oder partiellen Äquivalenzrelation stehen. Die Ermittlung dieser Äquivalente stützt sich auf die Übersetzung der Korpusbeispiele. Die Valenzbeschreibung der spanischen und der deutschen Verbvarianten enthält funktionale, kategoriale und semantische Angaben zu den einzelnen Verbaktanten und explizite Hinweise auf kontrastiv relevante Unterschiede zwischen den Einheiten beider Sprachen.

Creating an extensible, levelled study corpus of Russian (2016)

Batinić, Dolores ; Birzer, Sandra ; Zinsmeister, Heike

In this paper, we present first results of training a classifier for discriminating Russian texts into different levels of difficulty. For the classification we considered both surface-oriented features adopted from readability assessments and more linguistically informed, positional features to classify texts into two levels of difficulty. This text classification is the main focus of our Levelled Study Corpus of Russian (LeStCoR), in which we aim to build a corpus adapted for language learning purposes – selecting simpler texts for beginner second language learners and more complex texts for advanced learners. The most discriminative feature in our pilot study was a lexical feature that approximates accessibility of the vocabulary by the second language learner in terms of the proportion of familiar words in the texts. The best feature setting achieved an accuracy of 0.91 on a pilot corpus of 209 texts.

Einleitung (2012)

Blühdorn, Hardarik ; Lohnstein, Horst

Neuere Erkenntnisse zu den Strukturprinzipien von Wortbedeutungen und ihre Widerspiegelung in Wörterbüchern (1982)

Herberg, Dieter

Ausgehend von der Einsicht, dass Wortbedeutungen (Sememe) als strukturierte Komplexe semantischer Merkmale (SM oder Seme) aufgefasst werden können, wurden in den zurückliegenden Jahren verschiedene Ermittlungs- und Beschreibungsmethoden für die Wortbedeutung vorgeschlagen. Im Folgenden soll sowohl prinzipiell als auch am Beispiel erörtert werden, welche Möglichkeiten und Grenzen sich gegenwärtig für die lexikographische Nutzung der semantischen Merkmal- oder Komponentenanalysen (SMA) bei der Bedeutungserklärung in Gebrauchswörterbüchern der deutschen Gegenwartssprache abzeichnen.

Cleaning the Europarl Corpus for Linguistic Applications (2014)

Graën, Johannes ; Batinić, Dolores ; Volk, Martin

We discovered several recurring errors in the current version of the Europarl Corpus originating both from the web site of the European Parliament and the corpus compilation based thereon. The most frequent error was incompletely extracted metadata leaving non-textual fragments within the textual parts of the corpus files. This is, on average, the case for every second speaker change. We not only cleaned the Europarl Corpus by correcting several kinds of errors, but also aligned the speakers’ contributions of all available languages and compiled every- thing into a new XML-structured corpus. This facilitates a more sophisticated selection of data, e.g. querying the corpus for speeches by speakers of a particular political group or in particular language combinations.

Linguistic Measures of Pitch Range in Slavic and Germanic Languages (2015)

Andreeva, Bistra ; Möbius, Bernd ; Demenko, Grazyna ; Zimmerer, Frank ; Jügler, Jeanin

Based on specific linguistic landmarks in the speech signal, this study investigates pitch level and pitch span differences in English, German, Bulgarian and Polish. The analysis is based on 22 speakers per language (11 males and 11 females). Linear mixed models were computed that include various linguistic measures of pitch level and span, revealing characteristic differences across languages and between language groups. Pitch level appeared to have significantly higher values for the female speakers in the Slavic than the Germanic group. The male speakers showed slightly different results, with only the Polish speakers displaying significantly higher mean values for pitch level than the German males. Overall, the results show that the Slavic speakers tend to have a wider pitch span than the German speakers. But for the linguistic measure, namely for span between the initial peaks and the non-prominent valleys, we only find the difference between Polish and German speakers. We found a flatter intonation contour in German than in Polish, Bulgarian and English male and female speakers and differences in the frequency of the landmarks between languages. Concerning “speaker liveliness” we found that the speakers from the Slavic group are significantly livelier than the speakers from the Germanic group.

Differences of Pitch Profiles in Germanic and Slavic Languages (2014)

Andreeva, Bistra ; Demenko, Grazyna ; Möbius, Bernd ; Zimmerer, Frank ; Jügler, Jeanin ; Oleskowicz-Popiel, Magdalena

This study investigates cross-language differences in pitch range and variation in four languages from two language groups: English and German (Germanic) and Bulgarian and Polish (Slavic). The analysis is based on large multi-speaker corpora (48 speakers for Polish, 60 for each of the other three languages). Linear mixed models were computed that include various distributional measures of pitch level, span and variation, revealing characteristic differences across languages and between language groups. A classification experiment based on the relevant parameter measures (span, kurtosis and skewness values for pitch distributions for each speaker) succeeded in separating the language groups.

Zur Kontextualisierung von sozialen Kategorien und Stereotypen in der sprachlichen Interaktion (1995)

Kallmeyer, Werner

Zustimmen und Widersprechen. Zur Gesprächsanalyse von Problem- und Konfliktgesprächen (1994)

Kallmeyer, Werner

Wortbegriff und Orthographie (1980)

Herberg, Dieter

TEI Feature Structures as a Representation Format for Multiple Annotation and Generic XML Documents (2009)

Stegmann, Jens ; Witt, Andreas

Feature structures are mathematical entities (rooted labeled directed acyclic graphs) that can be represented as graph displays, attribute value matrices or as XML adhering to the constraints of a specialized TEI tag set. We demonstrate that this latter ISO-standardized format can be used as an integrative storage and exchange format for sets of multiple annotation XML documents. This specific domain of application is rooted in the approach of multiple annotations, which marks a possible solution for XML-compliant markup in scenarios with conflicting annotation hierarchies. A more extreme proposal consists in the possible use as a meta-representation format for generic XML documents. For both scenarios our strategy concerning pertinent feature structure representations is grounded on the XDM (XQuery 1.0 and XPath 2.0 Data Model). The ubiquitous hierarchical and sequential relationships within XML documents are represented by specific features that take ordered list values. The mapping to the TEI feature structure format has been implemented in the form of an XSLT 2.0 stylesheet. It can be characterized as exploiting aspects of both the push and pull processing paradigm as appropriate. An indexing mechanism is provided with regard to the multiple annotation documents scenario. Hence, implicit links concerning identical primary data are made explicit in the result format. In comparison to alternative representations, the TEI-based format does well in many respects, since it is both integrative and well-formed XML. However, the result documents tend to grow very large depending on the size of the input documents and their respective markup structure. This may also be considered as a downside regarding the proposed use for generic XML documents. On the positive side, it may be possible to achieve a hookup to methods and applications that have been developed for feature structure representations in the fields of (computational) linguistics and knowledge representation.

Zur Semantik kausaler Satzverbindungen: Integration, Fokussierung, Definitheit und modale Umgebung (2005)

Blühdorn, Hardarik

Wortbegriff und Orthographie (Resümee) (1979)

Herberg, Dieter

Die geltende Regelung der Getrennt- und Zusammenschreibung und Ansatzpunkte zu ihrer Vereinfachung (1975)

Herberg, Dieter

Abfragekomponente von COSMAS-II (1996)

Bodmer Mory, Franck

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

3836 search hits