Refine
Year of publication
Document Type
- Part of a Book (26)
- Article (17)
- Conference Proceeding (13)
- Other (5)
- Book (4)
Keywords
- Deutsch (36)
- Grammatik (25)
- Korpus <Linguistik> (20)
- Grammis (12)
- Computerlinguistik (11)
- Informationssystem (7)
- Terminologie (7)
- Datenbank (6)
- Automatische Sprachanalyse (5)
- Lyrics <Lyrik> (5)
Publicationstate
- Veröffentlichungsversion (35)
- Zweitveröffentlichung (14)
- Postprint (3)
- Preprint (1)
Reviewstate
- (Verlags)-Lektorat (29)
- Peer-Review (17)
- (Verlags-)Lektorat (1)
- Peer-review (1)
Publisher
- Institut für Deutsche Sprache (10)
- de Gruyter (7)
- Narr (6)
- Narr Francke Attempto (4)
- Springer (4)
- Universitätsverlag Rhein-Ruhr (3)
- European Language Resources Association (ELRA) (2)
- European language resources association (ELRA) (2)
- Gesellschaft für Sprachtechnologie und Computerlinguistik (2)
- Peter Lang (2)
Grammis is a web-based information system on German grammar, hosted by the Institute for the German Language (IDS). It is human-oriented and features different theoretical perspectives on grammar. Currently, the terminology component of grammis is being redesigned for this theoretical diversity to play a more prominent role in the data model. This also opens opportunities for implementing some machine-oriented features. In this paper, we present the re-design of both data model and knowledge base. We explore how the addition of machine-oriented features to the data model impacts the knowledge base; in particular, how this addition shifts some of the textual complexity into the data model. We show that our resource can easily be ported to a SKOS-XL representation, which makes it available for data science, knowledge-based NLP applications, and LOD in the context of digital humanities.
Terminological resources play a central role in the organization and retrieval of scientific texts. Both simple keyword lists and advanced modelings of relationships between terminological concepts can make a most valuable contribution to the analysis, classification, and finding of appropriate digital documents, either on the web or within local repositories. This seems especially true for long-established scientific fields with elusive theoretical and historical branches, where the use of terminology within documents from different origins is often far from being consistent. In this paper, we report on the progress of a linguistically motivated project on the onomasiological re-modeling of the terminological resources for the grammatical information system grammis. We present the design principles and the results of their application. In particular, we focus on new features for the authoring backend and discuss how these innovations help to evaluate existing, loosely structured terminological content, as well as to efficiently deal with automatic term extraction. Furthermore, we introduce a transformation to a future SKOS representation. We conclude with a positioning of our resources with regard to the Knowledge Organization discourse and discuss how a highly complex information environment like grammis benefits from the re-designed terminological KOS.
The compilation of terminological vocabularies plays a central role in the organization and retrieval of scientific texts. Both simple keyword lists as well as sophisticated modellings of relationships between terminological concepts can make a most valuable contribution to the analysis, classification, and finding of appropriate digital documents, either on the Web or within local repositories. This seems especially true for long-established scientific fields with various theoretical and historical branches, such as linguistics, where the use of terminology within documents from different origins is sometimes far from being consistent. In this short paper, we report on the early stages of a project that aims at the re-design of an existing domain-specific KOS for grammatical content grammis. In particular, we deal with the terminological part of grammis and present the state-of-the-art of this online resource as well as the key re-design principles. Further, we propose questions regarding ramifications of the Linked Open Data and Semantic Web approaches for our re-design decisions.
Editorial
(2013)
Das Vokabular von Songtexten im gesellschaftlichen Kontext – ein diachron-empirischer Beitrag
(2022)
Der Beitrag untersucht den Stellenwert gesellschaftlich relevanter Thematiken in deutschsprachigen Songtexten der zurückliegenden fünf Jahrzehnte. Dabei zeigt sich, dass neben individuellen Befindlichkeiten auch politische, sozialkritische oder umweltbezogene Themen signifikant angesprochen werden. Wir kontrastieren Songtexte mit anderen Testsorten und wenden dabei quantitative Methoden auf umfangreiche, breit stratifizierte Datensamples an, um die Phänomenbeschreibungen präzisierbar, generalisierbar und reproduzierbar zu machen. Das longitudinale Korpusdesign bietet Potenzial für diachrone Vergleiche. Im Sinne eines erweiterten „Mixed Methods“-Ansatzes exploriert die Studie zudem ausgewählte Aspekte qualitativ und bettet sie in den zeitlichen Kontext ein.
This paper describes a new approach to improve the analysis and categorization of web documents using statistical methods for template based clustering as well as semantical analysis based on terminological ontologies. A domain-specific environment serves for prove of concept. In order to demonstrate the widespread practical benefit of our approach, we outline a combined mathematical and semantical framework for information retrieval on internet resources.
In this feasibility study we aim at contributing at the practical use of domain ontologies for hypertext classification by introducing an algorithm generating potential keywords. The algorithm uses structural markup information and lemmatized word lists as well as a domain ontology on linguistics. We present the calculation and ranking of keyword candidates based on ontology relationships, word position, frequency information, and statistical significance as evidenced by log-likelihood tests. Finally, the results of our machine-driven classification are validated empirically against manually assigned keywords.
Linguistic query systems are special purpose IR applications. We present a novel state-of-the-art approach for the efficient exploitation of very large linguistic corpora, combining the advantages of relational database management systems (RDBMS) with the functional MapReduce programming model. Our implementation uses the German DEREKO reference corpus with multi-layer
linguistic annotations and several types of text-specific metadata, but the proposed strategy is language-independent and adaptable to large-scale multilingual corpora.
The evolution of computer technologies and the introduction of the World Wide Web (WWW) have substantially changed the way scientific articles and books are published today. Besides writing for "traditional" print media, more and more authors decide to reach a larger audience and to decrease distribution time by offering their works on the internet. The electronic medium not only facilitates the spread of information, it also adds new value by extending the possibilities of knowledge retrieval. Of course the same is true for structured data collections like scientific glossaries, dictionaries or bibliographies. They particularly profit from the web when being accessible via user-friendly and effective frontends. The following chapters deal with the transformation of the Bibliography of German Grammar (“Bibliografie zur deutschen Grammatik”) from a data pool primarly used for print publishing to a relational database application offering a basis for media-independent distribution. Starting with a short description of the beginnings of the bibliography, the focus of this article lies on the explanation of our current database design as well as on the presentation of the web-based user interface.
Die Publikation untersucht Nutzungs- und Gestaltungsprinzipien für benutzeradaptive Online Informationssysteme anhand des grammatischen Informationssystems "grammis" sowie der Propädeutischen Grammatik "ProGr@mm". Beides sind aktuelle Internet-Projekte, die am Institut für Deutsche Sprache in Mannheim beheimatet sind und seit Jahren erfolgreich für die Vermittlung von linguistischem Wissen genutzt werden. Aufbauend auf einer Reflexion sowohl der Vorteile als auch der aktuellen und prinzipiellen Probleme des elektronischen Publizierens im WWW wird ein Lösungsansatz vorgestellt, der aus der Perspektive des Systemdesigners die Möglichkeiten der Informationshaltung sowie der benutzerspezifischen, hypertextuellen Informationspräsentation aufzeigt. Dieser Ansatz ist von der letztendlichen Gestaltung des Bildschirmaufbaus unabhängig und konzentriert sich vielmehr auf die Frage, wie der Produzent unter Ausnutzung des Kommunikationspotenzials des WWW den Zugriff auf digital vorliegende Informationen realisieren kann. Das Ziel ist, aus mittels XML und Metadaten inhaltlich erschlossenen Hypertexten dynamische Webdokumente zu generieren. Ein zentraler Punkt dabei ist die Modellierung des Dialogs mit dem Benutzer: Wie kann die Weiterentwicklung der reinen Nutzungsinteraktivität zur Aktionsinteraktivität realisiert werden? Wie können explizite Repräsentationen von individuellen Benutzercharakteristika ermittelt und sinnvoll für ein adaptives Systemverhalten genutzt werden.
We present a novel NLP resource for the explanation of linguistic phenomena, built and evaluated exploring very large annotated language corpora. For the compilation, we use the German Reference Corpus (DeReKo) with more than 5 billion word forms, which is the largest linguistic resource worldwide for the study of contemporary written German. The result is a comprehensive database of German genitive formations, enriched with a broad range of intra- und extralinguistic metadata. It can be used for the notoriously controversial classification and prediction of genitive endings (short endings, long endings, zero-marker). We also evaluate the main factors influencing the use of specific endings. To get a general idea about a factor’s influences and its side effects, we calculate chi-square-tests and visualize the residuals with an association plot. The results are evaluated against a gold standard by implementing tree-based machine learning algorithms. For the statistical analysis, we applied the supervised LMT Logistic Model Trees algorithm, using the WEKA software. We intend to use this gold standard to evaluate GenitivDB, as well as to explore methodologies for a predictive genitive model.