Refine
Year of publication
- 2011 (104) (remove)
Document Type
- Part of a Book (64)
- Article (18)
- Conference Proceeding (14)
- Other (5)
- Book (1)
- Report (1)
- Working Paper (1)
Language
- German (79)
- English (22)
- French (1)
- Multiple languages (1)
- Russian (1)
Is part of the Bibliography
- no (104)
Keywords
- Deutsch (56)
- Korpus <Linguistik> (15)
- Grammatik (12)
- Konstruktionsgrammatik (9)
- Computerlinguistik (8)
- Computerunterstützte Lexikographie (8)
- Sprachvariante (8)
- Konversationsanalyse (6)
- Online-Wörterbuch (6)
- Englisch (5)
Publicationstate
- Veröffentlichungsversion (104) (remove)
Reviewstate
- (Verlags)-Lektorat (70)
- Peer-Review (22)
- Verlags-Lektorat (6)
- (Verlags-)Lektorat (2)
- Peer-review (2)
Publisher
- de Gruyter (25)
- Narr (18)
- Lang (8)
- Institut für Deutsche Sprache (3)
- Francke (2)
- Hempen (2)
- Universidade de Santiago de Compostela (2)
- Aletheia (1)
- Association for Computational Linguistics (1)
- Benjamins (1)
"Themengebundene Verwendung(en)" als neuer Angabetyp unter der Rubrik "Besonderheiten des Gebrauchs"
(2011)
Linguistic query systems are special purpose IR applications. We present a novel state-of-the-art approach for the efficient exploitation of very large linguistic corpora, combining the advantages of relational database management systems (RDBMS) with the functional MapReduce programming model. Our implementation uses the German DEREKO reference corpus with multi-layer linguistic annotations and several types of text-specific metadata, but the proposed strategy is language-independent and adaptable to large-scale multilingual corpora.
XML has been designed for creating structured documents, but the information that is encoded in these structures are, by definition, out of scope for XML. Additional sources, normally not easily interpretable by computers, such as documentation are needed to determine the intention of specific tags in a tag-set. The Component Metadata Infrastructure (CMDI) takes a rather pragmatic approach to foster interoperability between XML instances in the domain of metadata descriptions for language resources. This paper gives an overview of this approach.
This paper presents ongoing research which is embedded in an empirical-linguistic research program, set out to devise viable research strategies for developing an explanatory theory of grammar as a psychological and social phenomenon. As this phenomenon cannot be studied directly, the program attempts to approach it indirectly through its correlates in language corpora, which is justified by referring to the core tenets of Emergent Grammar. The guiding principle for identifying such corpus correlates of grammatical regularities is to imitate the psychological processes underlying the emergent nature of these regularities. While previous work in this program focused on syntagmatic structures, the current paper goes one step further by investigating schematic structures that involve paradigmatic variation. It introduces and explores a general strategy by which corpus correlates of such structures may be uncovered, and it further outlines how these correlates may be used to study the nature of the psychologically real schematic structures.
Between classical symbolic word sense disambiguation (wsd) using explicit deep semantic representations of sentences and texts and statistical wsd using word co-occurrence information, there is a recent tendency towards mediating methods. Similar to so-called lightweight semantics (Marek, 2009) we suggest to only make sparse use of semantic information. We describe an approximation model based upon flat underspecified discourse representation structures (FUDRSs, cf. Eberle, 2004) that weighs knowledge about context structure, lexical semantic restrictions and interpretation preferences. We give a catalogue of guidelines for human annotation of texts by corresponding indicators. Using this, the reliability of an analysis tool that implements the model can be tested with respect to annotation precision and disambiguation prediction and how both can be improved by bootstrapping the knowledge of the system using corpus information. For the balanced test corpus considered the recognition rate of the preferred reading is 80-90% (depending on the smoothing of parse errors).
Aus den Argumentstrukturen von Verben lassen sich vielfach eigenständige Argumentstrukturmuster mit idiosynkratischen formalen oder inhaltlichen Eigenschaften abstrahieren. Der Artikel zeigt, dass sich Ähnlichkeiten zwischen solchen Mustern nicht, wie von Goldberg (1995) vorgeschlagen, über das Konzept polysemer Argumentstrukturkonstruktionen erfassen lassen, sondern adäquater über ein Netz von Familienähnlichkeiten modelliert werden können. Die einzelnen Argumentstrukturmuster zeigen dabei eine Vielzahl von idiosynkratischen lexikalischen Kookkurrenzen, die spezifisch für die je einzelnen Argumentstrukturmuster sind und in einer implikativen Beziehung zu diesen stehen. Überlegungen zur angemessenen sprachtheoretischen Modellierung der Daten zeigen dabei sowohl Schwächen valenzbasierter Theorien als auch Mängel konstruktionsbasierter Ansätze auf.
The contribution will focus on aspects of pluricentricity in spoken Standard German. After a brief overview over the historical and dialectal background of the linguistic diversity in the German speaking area, the regionally balanced speech-corpus "German today” is presented, which has been collected for the analysis of the (regional) variation of spoken Standard German. Aspects of pluricentric German will be discussed by means of both the distribution of certain phonetic variables and a short analysis of regional differences in the use of certain conversational constructions. It is argued that pluricentric structures are constituted by a set of linguistic features on different levels of description. Above all, the analysis tries to reveal traces of the impact of both traditional dialects and national or even subnational political units on the constitution of the standard varieties.