Refine
Year of publication
- 2015 (148) (remove)
Document Type
- Part of a Book (55)
- Article (36)
- Conference Proceeding (31)
- Book (13)
- Part of Periodical (10)
- Working Paper (2)
- Review (1)
Is part of the Bibliography
- no (148) (remove)
Keywords
- Deutsch (52)
- Korpus <Linguistik> (24)
- Verb (10)
- Annotation (8)
- Englisch (8)
- Spanisch (7)
- Lernerwörterbuch (6)
- Mehrsprachigkeit (6)
- Computerlinguistik (5)
- Computerunterstützte Lexikographie (5)
Publicationstate
- Veröffentlichungsversion (82)
- Zweitveröffentlichung (17)
- Postprint (8)
- Erstveröffentlichung (1)
Reviewstate
- (Verlags)-Lektorat (62)
- Peer-Review (28)
- Peer-review (7)
- Verlags-Lektorat (4)
- Zweitveröffentlichung (2)
- Peer-Revied (1)
- Peer-reviewed (1)
- Review-Status-unbekannt (1)
Publisher
- Institut für Deutsche Sprache (23)
- de Gruyter (16)
- Narr (10)
- Lang (5)
- Springer (5)
- IDS (4)
- Narr Francke Attempto (3)
- Winter (3)
- Association for Computational Linguistics (2)
- De Gruyter (2)
Pogled u e-leksikografiju
(2015)
U radu se daje pregled temeljnih pojmova i klasifikacija u području e-leksikografije. Donosi se klasifikacija e-rječnika, prikazuje se leksikografski proces izrade e-rječnika te pregled najraširenijih sustava za izradu rječnika (DWS) i sustava za pretragu korpusa (CQS). Kao primjer dobre prakse detaljnije se opisuje mrežni rječnik elexiko (Institut za njemački jezik u Mannheimu): prikazuju se njegovi ciljevi i namjena, teorijske i metodološke postavke, moduli te mogućnosti uporabe. Kao moguća osnova za izradu korpusno utemeljenoga e-rječnika hrvatskoga jezika koji bi bio u skladu s najrecentnijim leksikografskim (i uopće lingvističkim) teorijama i praksama prikazuje se rad na mrežnome leksičko-semantičkome repozitoriju hrvatskoga jezika (baza semantičkih okvira, predodžbenih shema, kognitivnih primitiva i leksičkih jedinica) u okviru projekta Repozitorij metafora hrvatskoga jezika.
Sprichwörter im Gebrauch
(2015)
Moderne Grammatiktheorien sind statisch, d.h. skriptizistisch und synchronizistisch. Dies bedeutet, dass deren Beschreibungsapparat auf die Strukturen gegenwärtiger Schrift- und Standardsprachen zugeschnitten ist. Im Beitrag wird für einen dynamischen, d.h. nichtskriptizistischen und nichtsynchronizistischen, Perspektivenwechsel in der Grammatikforschung plädiert, der auf folgenden empirisch fundierten Überlegungen basiert:
1. Literalisierung ist eine kulturelle Universalie, die kognitiv verankert ist.
2. Es sind unterschiedliche Phasen der Literalisierung zu unterscheiden.
3. Literalisierung im Allgemeinen und die Phasen der Literalisierung im Besonderen haben Konsequenzen für die grammatische Struktur.
4. Die Interpretation von grammatischen Strukturen ist nur vor der Folie der jeweiligen Phase der Literalisierung möglich.
5. Ein dynamisches Grammatikmodell muss das historische Verhältnis auch begrifflich abbilden. Dies wird an zentralen grammatischen Konzepten wie Aggregation vs. Integration, Wortgruppe vs. Phase und an der Wortstellung (Verbklammer, Stellungsfeldermodell, Satzrandglieder) veranschaulicht.
6. Historisch ist von einem dynamischen Verhältnis von Online- und Offlinesyntax, von syntaktischer Zeitlichkeit und syntaktischer Räumlichkeit, auszugehen. Was zu einer bestimmten Zeit und in einer bestimmten Varietät als Onlinestruktur zu interpretieren ist, hängt von dem jeweiligen historischen Verhältnis von Online- und Offlinestrukturen ab.
This study examines the pitch profiles of French learners of German and German learners of French, both in their native language (L1), and in their respective foreign language (L2). Results of the analysis of 84 speakers suggest that for short read sentences, French and German speakers do not show pitch range differences in their native production. Furthermore, analyses of mean f0 and pitch range indicate that range is not necessarily reduced in L2 productions. These results are different from results reported in prior research. Possible reasons for these differences are discussed.
Satz - oberflächlich
(2015)
Das hier vorgestellte oberflächennahe Satzkonzept orientiert sich an der Definition der IDS-Grammatik: Sätze sind Konstruktionsformen, die mindestens aus einem finiten Verb und seinen Komplementen bestehen. Das semantische Korrelat des Satzes ist die Proposition, bestehend aus Prädikat und Argumenten. Die Unterscheidung der englischsprachigen Tradition zwischen sentence und clause bzw. die entsprechende Unterscheidung zwischen proposition und phrase im Französischen wird in diesem Ansatz durch die Opposition zwischen ,Vollsatz‘ und ,Teilsatz‘ erfasst. Oberflächenorientierte Satzdefinitionen können, im Gegensatz zu der hier vertretenen intern-syntaktischen Definition, auch – in syntaktischer Hinsicht – auf externen Merkmalen beruhen, nämlich auf orthografisch-prosodischen Merkmalen oder dem Kriterium der syntaktischen Unabhängigkeit gemäß Bloomfields bekannter Satzdefinition. In typologischer Perspektive zeichnen sich Sätze durch einen „satzkonstituierenden Akt“ (Sasse 1991, 77) aus bzw. eine spezifische morphosyntaktische Konstellation, die zum Ausdruck des Sachverhalts hinzukommen muss. Unter pragmatischer Perspektive ist der Satz die prototypische Mitteilungseinheit. Er kann dekontextualisiert werden, während andere Mitteilungsformen nur in ihrem jeweiligen Kontext interpretierbar sind. Ihrem semiotischen Status nach sind Sätze komplexe sprachliche Zeichen. Die ihnen zugrundeliegenden Regeln oder Konstruktionen hingegen haben keinen Zeichencharakter.
Voll Energie stecken und voller Geigen hängen - seltsame Phrasentypen und ungewöhnliche Valenzmuster
(2015)
Der Beitrag soll Anregungen geben, wie zwei verschiedene Forschungsstränge zusammengeführt werden könnten, die in der deutschen und französischen germanistischen Linguistik intensiv verfolgt werden. Es handelt sich dabei um die Forschung zu so genannten „Ellipsen“ und die Forschung zur Informations - Struktur bzw. Thema-Rhema-Struktur. Ausgehend von einem Ausschnitt aus einem literarischen Text wird eine kleine Typologie für Sequenzellipsen und ,selbstständige Text-KM‘, wie ich sie im Anschluss an die IDS-Grammatik nennen möchte, vorgestellt. Bei der informationsstrukturellen Analyse wird neben der thematischen Struktur auch der Informationsstatus herangezogen, sodass ein vergleichsweise komplexes Bild der Dynamik im Text nachgezeichnet werden kann. Am Beispiel zweigliedriger .interner Prädikationen wird gezeigt, dass sich hinter der Oberfläche der zwei möglichen Typen der Linearisierung mehrere Strategien der Informationsstrukturierung verbergen. Es bietet sich an, bei deren Beschreibung aus heuristischen Gründen nach dem Modell der Linearstruktur des Verbalsatzes zu verfahren.
This paper investigates evidence for linguistic coherence in new urban dialects that evolved in multiethnic and multilingual urban neighbourhoods. We propose a view of coherence as an interpretation of empirical observations rather than something that would be ‘‘out there in the data’’, and argue that this interpretation should be based on evidence of systematic links between linguistic phenomena, as established by patterns of covariation between phenomena that can be shown to be related at linguistic levels. In a case study, we present results from qualitative and quantitative analyses for a set of phenomena that have been described for Kiezdeutsch, a new dialect from multilingual urban Germany. Qualitative analyses point to linguistic relationships between different phenomena and between pragmatic and linguistic levels. Quantitative analyses, based on corpus data from KiDKo (www.kiezdeutschkorpus.de), point to systematic advantages for the Kiezdeutsch data from a multiethnic and multilingual context provided by the main corpus (KiDKo/Mu), compared to complementary corpus data from a mostly monoethnic and monolingual (German) context (KiDKo/Mo). Taken together, this indicates patterns of covariation that support an interpretation of coherence for this new dialect: our findings point to an interconnected linguistic system, rather than to a mere accumulation of individual features. In addition to this internal coherence, the data also points to external coherence: Kiezdeutsch is not disconnected on the outside either, but fully integrated within the general domain of German, an integration that defies a distinction of ‘‘autochthonous’’ and ‘‘allochthonous’’ German, not only at the level of speakers, but also at the level of linguistic systems.
The present study introduces articulography, the measurement of the position of tongue and lips during speech, as a promising method to the study of dialect variation. By using generalized additive modeling to analyze articulatory trajectories, we are able to reliably detect aggregate group differences, while simultaneously taking into account the individual variation across dozens of speakers. Our results on the basis of Dutch dialect data show clear differences between the southern and the northern dialect with respect to tongue position, with a more frontal tongue position in the dialect from Ubbergen (in the southern half of the Netherlands) than in the dialect of Ter Apel (in the northern half of the Netherlands). Thus articulography appears to be a suitable tool to investigate structural differences in pronunciation at the dialect level.
Opinion Holder and Target Extraction for Verb-based Opinion Predicates – The Problem is Not Solved
(2015)
We offer a critical review of the current state of opinion role extraction involving opinion verbs. We argue that neither the currently available lexical resources nor the manually annotated text corpora are sufficient to appropriately study this task. We introduce a new corpus focusing on opinion roles of opinion verbs from the Subjectivity Lexicon and show potential benefits of this corpus. We also demonstrate that state-of-the-art classifiers perform rather poorly on this new dataset compared to the standard dataset for the task showing that there still remains significant research to be done.
We present an approach for opinion role induction for verbal predicates. Our model rests on the assumption that opinion verbs can be divided into three different types where each type is associated with a characteristic mapping between semantic roles and opinion holders and targets. In several experiments, we demonstrate the relevance of those three categories for the task. We show that verbs can easily be categorized with semi-supervised graphbased clustering and some appropriate similarity metric. The seeds are obtained through linguistic diagnostics. We evaluate our approach against a new manually-compiled opinion role lexicon and perform in-context classification.
We examine the combination of pattern-based and distributional similarity for the induction of semantic categories. Pattern-based methods are precise and sparse while distributional methods have a higher recall. Given these particular properties we use the prediction of distributional methods as a back-off to pattern-based similarity. Since our pattern-based approach is embedded into a semi-supervised graph clustering algorithm, we also examine how distributional information is best added to that classifier. Our experiments are carried out on 5 different food categorization tasks.
In this article, we explore the feasibility of extracting suitable and unsuitable food items for particular health conditions from natural language text. We refer to this task as conditional healthiness classification. For that purpose, we annotate a corpus extracted from forum entries of a food-related website. We identify different relation types that hold between food items and health conditions going beyond a binary distinction of suitability and unsuitability and devise various supervised classifiers using different types of features. We examine the impact of different task-specific resources, such as a healthiness lexicon that lists the healthiness status of a food item and a sentiment lexicon. Moreover, we also consider task-specific linguistic features that disambiguate a context in which mentions of a food item and a health condition co-occur and compare them with standard features using bag of words, part-of-speech information and syntactic parses. We also investigate in how far individual food items and health conditions correlate with specific relation types and try to harness this information for classification.
This article reports on the on-going CoRoLa project, aiming at creating a reference corpus of contemporary Romanian (from 1945 onwards), opened for online free exploitation by researchers in linguistics and language processing, teachers of Romanian, students. We invest serious efforts in persuading large publishing houses and other owners of IPR on relevant language data to join us and contribute the project with selections of their text and speech repositories. The CoRoLa project is coordinated by two Computer Science institutes of the Romanian Academy, but enjoys cooperation of and consulting from professional linguists from other institutes of the Romanian Academy. We foresee a written component of the corpus of more than 500 million word forms, and a speech component of about 300 hours of recordings. The entire collection of texts (covering all functional styles of the language) will be pre-processed and annotated at several levels, and also documented with standardized metadata. The pre-processing includes cleaning the data and harmonising the diacritics, sentence splitting and tokenization. Annotation will include morpho-lexical tagging and lemmatization in the first stage, followed by syntactic, semantic and discourse annotation in a later stage.