Refine
Year of publication
Document Type
- Conference Proceeding (688) (remove)
Keywords
- Korpus <Linguistik> (237)
- Deutsch (167)
- Computerlinguistik (117)
- Annotation (65)
- Automatische Sprachanalyse (53)
- Gesprochene Sprache (53)
- Natürliche Sprache (41)
- Forschungsdaten (38)
- Information Extraction (30)
- Metadaten (30)
Publicationstate
- Veröffentlichungsversion (442)
- Zweitveröffentlichung (81)
- Postprint (38)
- Preprint (1)
Reviewstate
- Peer-Review (328)
- (Verlags)-Lektorat (137)
- Peer-review (9)
- Review-Status-unbekannt (7)
- Peer review (1)
- Verlags-Lektorat (1)
Publisher
- European Language Resources Association (ELRA) (50)
- Association for Computational Linguistics (43)
- European Language Resources Association (35)
- Institut für Deutsche Sprache (17)
- Zenodo (15)
- Lexical Computing CZ s.r.o. (12)
- Linköping University Electronic Press (12)
- CLARIN (11)
- International Speech Communication Association (9)
- Leibniz-Institut für Deutsche Sprache (9)
This paper presents an annotation scheme for English modal verbs together with sense-annotated data from the news domain. We describe our annotation scheme and discuss problematic cases for modality annotation based on the inter-annotator agreement during the annotation. Furthermore, we present experiments on automatic sense tagging, showing that our annotations do provide a valuable training resource for NLP systems.
Die vorliegende empirische Untersuchung befasst sich mit einer Umfrage zur Wörterbuchbenutzung bei 41 Studentinnen und Studenten des Dipartimento di Filologia, Letteratura e Linguistica der Universität Pisa, dasselbe Department, an dem auch das deutsch-italienische sprachwissenschaftliche Online-Wörterbuch DIL erarbeitet worden ist (vgl. Flinz: 2011). Die schriftliche Umfrage wurde in Anlehnung an Hartmanns 5. Hypothese „An analysis of users´ needs should precede dictionary design“ (1989) durchgeführt. Die wichtigsten Ergebnisse waren von großer Bedeutung für die Gestaltung der makro- und mikrostrukturellen Eigenschaften des Fachwörterbuches. Die Ergebnisse der Untersuchung und die daraus folgenden Reflektionen werden in thematischen Kernblöcken vorgestellt.
We describe a simple procedure for the automatic creation of word-level alignments between printed documents and their respective full-text versions. The procedure is unsupervised, uses standard, off-the-shelf components only, and reaches an F-score of 85.01 in the basic setup and up to 86.63 when using pre- and post-processing. Potential areas of application are manual database curation (incl. document triage) and biomedical expression OCR.
A frequently replicated finding is that higher frequency words tend to be shorter and contain more strongly reduced vowels. However, little is known about potential differences in the articulatory gestures for high vs. low frequency words. The present study made use of electromagnetic articulography to investigate the production of two German vowels, [i] and [a], embedded in high and low frequency words. We found that word frequency differently affected the production of [i] and [a] at the temporal as well as the gestural level. Higher frequency of use predicted greater acoustic durations for long vowels; reduced durations for short vowels; articulatory trajectories with greater tongue height for [i] and more pronounced downward articulatory trajectories for [a]. These results show that the phonological contrast between short and long vowels is learned better with experience, and challenge both the Smooth Signal Redundancy Hypothesis and current theories of German phonology.
Scientific interest in von Kempelen's 'speaking machine' stems mainly from a general interest in the history of science. This study, however, is devoted to the question of what relevance the 'speaking machine' has today. Apart for discussing why it fascinates researchers and non-researchers alike we describe the potential of replicas as an instrument for demonstration and for researching speech generation.
This paper is a contribution to the ongoing discussion on treebank annotation schemes and their impact on PCFG parsing results. We provide a thorough comparison of two German treebanks: the TIGER treebank and the TüBa-D/Z. We use simple statistics on sentence length and vocabulary size, and more refined methods such as perplexity and its correlation with PCFG parsing results, as well as a Principal Components Analysis. Finally we present a qualitative evaluation of a set of 100 sentences from the TüBa- D/Z, manually annotated in the TIGER as well as in the TüBa-D/Z annotation scheme, and show that even the existence of a parallel subcorpus does not support a straightforward and easy comparison of both annotation schemes.