Refine
Year of publication
- 2018 (6) (remove)
Document Type
- Part of a Book (5)
- Article (1)
Has Fulltext
- yes (6)
Is part of the Bibliography
- yes (6)
Keywords
- Deutsch (4)
- Grammatik (3)
- Grammis (3)
- Informationssystem (2)
- Korpus <Linguistik> (2)
- Terminologie (2)
- grammar and syntax (2)
- grammatical information system (2)
- grammatical terminology (2)
- machine learning methods (2)
Publicationstate
- Veröffentlichungsversion (5)
- Postprint (1)
- Zweitveröffentlichung (1)
Reviewstate
- Peer-Review (3)
- (Verlags)-Lektorat (2)
Publisher
Terminological resources play a central role in the organization and retrieval of scientific texts. Both simple keyword lists and advanced modelings of relationships between terminological concepts can make a most valuable contribution to the analysis, classification, and finding of appropriate digital documents, either on the web or within local repositories. This seems especially true for long-established scientific fields with elusive theoretical and historical branches, where the use of terminology within documents from different origins is often far from being consistent. In this paper, we report on the progress of a linguistically motivated project on the onomasiological re-modeling of the terminological resources for the grammatical information system grammis. We present the design principles and the results of their application. In particular, we focus on new features for the authoring backend and discuss how these innovations help to evaluate existing, loosely structured terminological content, as well as to efficiently deal with automatic term extraction. Furthermore, we introduce a transformation to a future SKOS representation. We conclude with a positioning of our resources with regard to the Knowledge Organization discourse and discuss how a highly complex information environment like grammis benefits from the re-designed terminological KOS.
Seit Mitte der 1990er Jahre wird am Institut für deutsche Sprache (IDS) in Mannheim erforscht, wie der hochkomplexe Gegenstandsbereich „Grammatik“ unter Ausnutzung digitaler Sprachressourcen und hypertextueller Navigationsstrukturen gleichermaßen wissenschaftlich fundiert und anschaulich vermittelt werden kann. Die grammatischen Online-Informationssysteme des IDS wenden sich nicht allein an Forscher und die interessierte Öffentlichkeit in Deutschland, sondern in gleichem Maße an Germanisten und Deutsch-Lernende in der ganzen Welt. Der vorliegende Beitrag beschreibt die damit verbundenen Hoffnungen und Anspruche. Daran anschließend thematisiert er praktische Einsatzmöglichkeiten und skizziert die funktionale und inhaltliche Weiterentwicklung der digitalen Grammatik-Angebote.
Der vorliegende Band befasst sich mit dem Stand und der Entwicklung von Forschungsinfrastrukturen für die germanistische Linguistik und einigen angrenzenden Bereichen. Einen zentralen Aspekt dabei bildet die Notwendigkeit, Kooperativität in der Wissenschaft im institutionellen Sinne, aber auch in Hinsicht auf die wissenschaftliche Praxis zu organisieren. Dies geschieht in Verbunden als Kooperationsstrukturen, wobei Sprachwissenschaft und Sprachtechnologie miteinander verbunden werden. Als zentraler Forschungsressource kommen dabei Korpora und ihrer Erschließung durch spezielle, linguistisch motivierte Informationssysteme besondere Bedeutung zu. Auf der Ebene der Daten werden durch Annotations- und Modellierungsstandards die Voraussetzung für eine nachhaltige Nutzbarkeit derartiger Ressourcen geschaffen.
The paper describes preliminary studies regarding the usage of Example-Based Querying for specialist corpora. We outline an infrastructure for its application within the linguistic domain. Example-Based Querying deals with retrieval situations where users would like to explore large collections of specialist texts semantically, but are unable to explicitly name the linguistic phenomenon they look for. As a way out, the proposed framework allows them to input prototypical everyday language examples or cases of doubt, which are automatically processed by CRF and linked to appropriate linguistic texts in the corpus.
Complement phrases are essential for constructing well-formed sentences in German. Identifying verb complements and categorizing complement classes is challenging even for linguists who are specialized in the field of verb valency. Against this background, we introduce an ML-based algorithm which is able to identify and classify complement phrases of any German verb in any written sentence context. We use a large training set consisting of example sentences from a valency dictionary, enriched with POS tagging, and the ML-based technique of Conditional Random Fields (CRF) to generate the classification models.
In this paper, we present our approach to automatically extracting German terminology in the domain of grammar using texts from the online information system grammis as our corpus. We analyze existing repositories of German grammatical terminology and develop Part-of-speech patterns for our extraction thereby showing the importance of unigrams in this domain. We contrast the results of the automatic extraction with a manually extracted standard. By comparing the performance of well-known statistical measures, we show how measures based on corpus comparison outperform alternative methods.