Schriftenreihe der Österreichischen Gesellschaft für Artificial Intelligence (ÖGAI)
Wien: Eigenverlag ÖGAI
Refine
Document Type
Has Fulltext
- yes (4)
Is part of the Bibliography
- no (4)
Keywords
- Korpus <Linguistik> (3)
- Computerlinguistik (2)
- Gesprochene Sprache (2)
- Annotation (1)
- Automatische Sprachanalyse (1)
- Deutsch (1)
- Diskursanalyse (1)
- Empirische Linguistik (1)
- Frame-Semantik (1)
- Information Extraction (1)
Publicationstate
Reviewstate
- Peer-Review (2)
- (Verlags)-Lektorat (1)
Band 5
In this paper, we examine methods to automatically extract domain-specific knowledge from the food domain from unlabeled natural language text. We employ different extraction methods ranging from surface patterns to co-occurrence measures applied on different parts of a document. We show that the effectiveness of a particular method depends very much on the relation type considered and that there is no single method that works equally well for every relation type. We also examine a combination of extraction methods and also consider relationships between different relation types. The extraction methods are applied both on a domain-specific corpus and the domain-independent factual knowledge base Wikipedia. Moreover, we examine an open-domain lexical ontology for suitability.
5
This paper presents an extension to the Stuttgart-Tübingen TagSet, the standard part-of-speech tag set for German, for the annotation of spoken language. The additional tags deal with hesitations, backchannel signals, interruptions, onomatopoeia and uninterpretable material. They allow one to capture phenomena specific to spoken language while, at the same time, preserving inter-operability with already existing corpora of written language.
5
This paper presents Release 2.0 of the SALSA corpus, a German resource for lexical semantics. The new corpus release provides new annotations for German nouns, complementing the existing annotations of German verbs in Release 1.0. The corpus now includes around 24,000 sentences with more than 36,000 annotated instances. It was designed with an eye towards NLP applications such as semantic role labeling but will also be a useful resource for linguistic studies in lexical semantics.
5
This paper attempts a new look at computer assisted transcription as it is commonly practised within the fields of discourse analysis and language acquisition studies. The first part proposes a bridge between discourse analytical methodology and text technological methods with the concept of modelling as its central idea. The second part demonstrates the EXMARaLDA system, a set of formats and tools for computer assisted transcription that builds on the ideas developed in the first part and implements them in a way that can lead to significant improvement in current research practice.