Refine
Year of publication
Document Type
- Conference Proceeding (512) (remove)
Is part of the Bibliography
- no (512) (remove)
Keywords
- Korpus <Linguistik> (146)
- Deutsch (117)
- Computerlinguistik (93)
- Annotation (43)
- Automatische Sprachanalyse (38)
- Natürliche Sprache (33)
- Gesprochene Sprache (30)
- Information Extraction (29)
- Englisch (26)
- Metadaten (24)
Publicationstate
- Veröffentlichungsversion (317)
- Zweitveröffentlichung (57)
- Postprint (30)
- Preprint (1)
Reviewstate
- Peer-Review (196)
- (Verlags)-Lektorat (131)
- Review-Status-unbekannt (6)
- Peer-review (3)
- Verlags-Lektorat (1)
Publisher
- Association for Computational Linguistics (35)
- European Language Resources Association (ELRA) (30)
- European Language Resources Association (22)
- Institut für Deutsche Sprache (16)
- International Speech Communication Association (9)
- Springer (8)
- Leibniz-Institut für Deutsche Sprache (7)
- ELRA (6)
- Extreme Markup Languages Conference (6)
- CSLI Publications (5)
Whether verbs have to be marked as punctual vs. durative has been a controversial issue from the very beginnings of research on aktionsarten in the last century right on up to modern theories of aspectual classes and aspect composition. Debates about the linguistic necessity of this distinction have often been accompanied by the question of what it means for a verb to be temporally punctual. In this paper I will, firstly, sketch the history of research on the punctual-durative distinction and present several linguistic arguments in its favor. Secondly, I will show how this distinction is captured in an eventstructure- based approach to lexical semantics. Thirdly, I will discuss the extent to which a precise definition of the notions used in lexical
representations helps avoid circular argumentation in lexical semantics. Finally, I will demonstrate how this can be done for the notion of ‘punctuality’ by clarifying the logical type of this predicate and relating it to central cognitive time concepts.
“My Curiosity was Satisfied, but not in a Good Way”: Predicting User Ratings for Online Recipes
(2014)
In this paper, we develop an approach to automatically predict user ratings for recipes at Epicurious.com, based on the recipes’ reviews. We investigate two distributional methods for feature selection, Information Gain and Bi-Normal Separation; we also compare distributionally selected features to linguistically motivated features and two types of frameworks: a one-layer system where we aggregate all reviews and predict the rating vs. a two-layer system where ratings of individual reviews are predicted and then aggregated. We obtain our best results by using the two-layer architecture, in combination with 5 000 features selected by Information Gain. This setup reaches an overall accuracy of 65.60%, given an upper bound of 82.57%.
Статтю присвячено дослідженню комунікативних невдач у мовленнєвому жанрі відеоінтерв’ю крізь призму української національної ідентичності. Визначено тематику, типи і жанрово-мовну специфіку українського відеоінтерв’ю як зразка діалогічного мовлення. Встановлено специфіку комунікативних невдач у цьому жанрі (зі спортсменами, політиками і культурними діячами) з огляду на позиції комунікантів, структурні рівні досліджуваного жанру та максими спілкування.
Die wissenschaftliche Beschäftigung mit der Kempelen'schen Sprechmaschine erfolgt zumeist aus wissenschaftshistorischen Motiven heraus. Der vorliegende Aufsatz widmet sich der Frage, welche Bedeutung der Sprechmaschine heutzutage zukommt. Neben möglichen Erklärungen, weswegen die Sprechmaschine auf Wissenschaftler wie Nicht-Wissenschaftler faszinierend wirkt, beschreiben wir den Einsatz von Nachbauten als Instrument zur Demonstration und auch zur Erforschung der Erzeugung von Sprachschall.
The IMS Open Corpus Workbench (CWB) software currently uses a simple tabular data model with proven limitations. We outline and justify the need for a new data model to underlie the next major version of CWB. This data model, dubbed Ziggurat, defines a series of types of data layer to represent different structures and relations within an annotated corpus; each such layer may contain variables of different types. Ziggurat will allow us to gradually extend and enhance CWB’s existing CQP-syntax for corpus queries, and also make possible more radical departures relative not only to the current version of CWB but also to other contemporary corpus-analysis software.
This paper presents an annotation scheme for English modal verbs together with sense-annotated data from the news domain. We describe our annotation scheme and discuss problematic cases for modality annotation based on the inter-annotator agreement during the annotation. Furthermore, we present experiments on automatic sense tagging, showing that our annotations do provide a valuable training resource for NLP systems.
Die vorliegende empirische Untersuchung befasst sich mit einer Umfrage zur Wörterbuchbenutzung bei 41 Studentinnen und Studenten des Dipartimento di Filologia, Letteratura e Linguistica der Universität Pisa, dasselbe Department, an dem auch das deutsch-italienische sprachwissenschaftliche Online-Wörterbuch DIL erarbeitet worden ist (vgl. Flinz: 2011). Die schriftliche Umfrage wurde in Anlehnung an Hartmanns 5. Hypothese „An analysis of users´ needs should precede dictionary design“ (1989) durchgeführt. Die wichtigsten Ergebnisse waren von großer Bedeutung für die Gestaltung der makro- und mikrostrukturellen Eigenschaften des Fachwörterbuches. Die Ergebnisse der Untersuchung und die daraus folgenden Reflektionen werden in thematischen Kernblöcken vorgestellt.
We describe a simple procedure for the automatic creation of word-level alignments between printed documents and their respective full-text versions. The procedure is unsupervised, uses standard, off-the-shelf components only, and reaches an F-score of 85.01 in the basic setup and up to 86.63 when using pre- and post-processing. Potential areas of application are manual database curation (incl. document triage) and biomedical expression OCR.