Refine
Year of publication
Document Type
- Conference Proceeding (688) (remove)
Keywords
- Korpus <Linguistik> (237)
- Deutsch (167)
- Computerlinguistik (117)
- Annotation (65)
- Automatische Sprachanalyse (53)
- Gesprochene Sprache (53)
- Natürliche Sprache (41)
- Forschungsdaten (38)
- Information Extraction (30)
- Metadaten (30)
Publicationstate
- Veröffentlichungsversion (442)
- Zweitveröffentlichung (81)
- Postprint (38)
- Preprint (1)
Reviewstate
- Peer-Review (328)
- (Verlags)-Lektorat (137)
- Peer-review (9)
- Review-Status-unbekannt (7)
- Peer review (1)
- Verlags-Lektorat (1)
Publisher
- European Language Resources Association (ELRA) (50)
- Association for Computational Linguistics (43)
- European Language Resources Association (35)
- Institut für Deutsche Sprache (17)
- Zenodo (15)
- Lexical Computing CZ s.r.o. (12)
- Linköping University Electronic Press (12)
- CLARIN (11)
- International Speech Communication Association (9)
- Leibniz-Institut für Deutsche Sprache (9)
This paper gives an insight into the basic concepts for a corpus-based lexical resource of spoken German, which is being developed by the project "The Lexicon of Spoken German"(Lexik des gesprochenen Deutsch, LeGeDe) at the "Institute for the German Language" (Institut für Deutsche Sprache, IDS) in Mannheim. The focus of the paper is on initial ideas of semi-automatic and automatic resources that assist the quantitative analysis of the corpus data for the creation of dictionary content. The work is based on the "Research and Teaching Corpus of Spoken German" (Forschungs- und Lehrkorpus Gesprochenes Deutsch, FOLK).
This paper discusses how cognitive aspects can be incorporated into lexicographic meaning descriptions based on corpus-driven analysis. The new German Online dictionary “Paronyme − Dynamisch im Kontrast” is concerned with easily confused words such as effektiv/effizient, sensibel/sensitiv. It is currently in the process of being developed and it aims at adopting a more conceptual and encyclopedic approach to meaning. Contrastive entries emphasize usage, comparing conceptual categories and indicating the mapping of knowledge. Adaptable access to lexicographic details offers different perspectives on information, and authentic examples reflect prototypical structures.
Some of the cognitive features are demonstrated with the help of examples. Firstly, I will outline how patterns of usage imply conceptual categories as central ideas instead of sufficiently logical criteria of semantic distinction. In this way, linguistic findings correlate better with how users conceptualize language. Secondly, it is pointed out how collocates are family members and fillers in contexts. Thirdly, I will demonstrate how contextual structure and function are included by summarizing referential information. Details are drawn from corpus data; they are usage-based patterns illustrating conversational interaction and semantic negotiation in contemporary public discourse. Finally, I will show flexible consultation routines where the focus on structural knowledge changes.
The paper reviews the results of work done in the context of TEI-Lex0, a joint ENeL / DARIAH / PARTHENOS initiative aimed at formulating guidelines for the encoding of retrodigitized dictionaries by streamlining and simplifying the recommendations of the “Print Dictionaries” chapter of the TEI Guidelines. TEI-Lex0 work is performed by teams concentrating on each of the main components of dictionary entries. The work presented here concerns proposals for constraining TEI-based encoding of orthographic, phonetic, and grammatical information on written and spoken forms of the lemma (headword), including auxiliary inflected forms. We also adduce examples of handling various types of orthographic and phonetic variants, as well as examples of handling the representation of inflectional paradigms, which have received less attention in the TEI Guidelines but which are nonetheless essential for properly exposing data content to the various uses that digitized lexica may have.
In this paper, we will present a first attempt to classify commonly confused words in German by consulting their communicative functions in corpora. Although the use of so-called paronyms causes frequent uncertainties due to similarities in spelling, sound and semantics, up until now the phenomenon has attracted little attention either from the perspective of corpus linguistics or from cognitive linguistics. Existing investigations rely on structuralist models, which do not account for empirical evidence. Still, they have developed an elaborate model based on formal criteria, primarily on word formation (cf. Lăzărescu 1999). Looking from a corpus perspective, such classifications are incompatible with language in use and cognitive elements of misuse.
This article sketches first lexicological insights into a classification model as derived from semantic analyses of written communication. Firstly, a brief description of the project will be provided. Secondly, corpus-assisted paronym detection will be focused. Thirdly, in the main section the paper concerns the description of the datasets for paronym classification and the classification procedures. As a work in progress, new insights will continually be extended once spoken and CMC data are added to the investigations.
Making 1:n explorable: a search interface for the ZAS database of clause-embedding predicates
(2017)
We introduce a recently published corpus-based database of German clause-embedding predicates and present an innovative web application for exploring it. The application displays the predicates and the corpus examples for these predicates in two separate tables that can be browsed and searched in real time. While familiar web interface paradigms make it easy for users to get started, the data presentation and the interactive advanced search components for the two tables are designed to accommodate remarkably complex query needs without the need for resorting to a dedicated query language or a more specialized tool. The 1:n relationship between predicates and their examples is exploited in the two tables in that, e.g. the predicate table also shows, for each predicate and each example attribute, all values that occur in the examples for this predicate. An easy-to-use visual query builder for arbitrary Boolean combinations of search criteria can optionally be displayed to pre-filter the underlying data presented in both tables. Several options for altering quantifier scope can be activated with simple checkboxes and considerably widen the space of searchable constellations.
The German Historical Institute Washington (GHI) is in the development phase of German History Digital (GH-D), a transatlantic digital initiative to meet the scholarly needs of historians and their students facing new historiographical and technological challenges. In the proposed paper we will discuss the research goals, methodology, prototyping, and development strategy of GH-D as infrastructure to facilitate transnational historical knowledge co-creation for the large community of researchers and students already relying on digital resources of the GHI and for the growing constituency of citizen scholars.
CoMParS is a resource under construction in the context of the long-term project German Grammar in European Comparison (GDE) at the IDS Mannheim. The principal goal of GDE is to create a novel contrastive grammar of German against the background of other European languages. Alongside German, which is the central focus, the core languages for comparison are English, French, Hungarian and Polish, representing different typological classes. Unlike traditional contrastive grammars available for German, which usually cover language pairs and are based on formal grammatical categories, the new GDE grammar is developed in the spirit of functionalist typology. This implies that, instead of formal criteria, cognitively motivated functional domains in terms of Givón (1984) are used as tertia comparationis. The purpose of CoMParS is to document the empirical basis of the theoretical assumptions of GDE-V and to illustrate the otherwise rather abstract content of grammar books by as many as possible naturally occurring and adequately presented multilingual examples, including information on their use in specific contexts and registers. These examples come from existing parallel corpora, and our presentation will focus on the legal aspects and consequences of this choice of language data.