Computerlinguistik
Refine
Year of publication
Document Type
- Conference Proceeding (219)
- Part of a Book (68)
- Article (67)
- Book (16)
- Working Paper (9)
- Contribution to a Periodical (7)
- Doctoral Thesis (7)
- Other (7)
- Report (5)
- Master's Thesis (3)
Language
- English (282)
- German (127)
- French (1)
- Multiple languages (1)
Is part of the Bibliography
- no (411) (remove)
Keywords
- Computerlinguistik (158)
- Korpus <Linguistik> (93)
- Annotation (48)
- Automatische Sprachanalyse (48)
- Deutsch (39)
- Natürliche Sprache (39)
- XML (33)
- Information Extraction (28)
- Maschinelles Lernen (24)
- Metadaten (22)
Publicationstate
- Veröffentlichungsversion (231)
- Zweitveröffentlichung (67)
- Postprint (46)
- (Verlags)-Lektorat (1)
- Erstveröffentlichung (1)
- Preprint (1)
Reviewstate
Publisher
- Association for Computational Linguistics (32)
- Springer (20)
- European Language Resources Association (ELRA) (17)
- European Language Resources Association (16)
- de Gruyter (14)
- Institut für Deutsche Sprache (13)
- Universitätsverlag Hildesheim (8)
- Narr (7)
- Oxford University Press (6)
- LiU Electronic Press (5)
This paper presents the application of the <tiger2/> format to various linguistic scenarios with the aim of making it the standard serialisation for the ISO 24615 [1] (SynAF) standard. After outlining the main characteristics of both the SynAF metamodel and the <tiger2/> format, as extended from the initial Tiger XML format [2], we show through a range of different language families how <tiger2/> covers a variety of constituency and dependency based analyses.
A comparison between morphological complexity measures: typological data vs. language corpora
(2016)
Language complexity is an intriguing phenomenon argued to play an important role in both language learning and processing. The need to compare languages with regard to their complexity resulted in a multitude of approaches and methods, ranging from accounts targeting specific structural features to global quantification of variation more generally. In this paper, we investigate the degree to which morphological complexity measures are mutually correlated in a sample of more than 500 languages of 101 language families. We use human expert judgements from the World Atlas of Language Structures (WALS), and compare them to four quantitative measures automatically calculated from language corpora. These consist of three previously defined corpus-derived measures, which are all monolingual, and one new measure based on automatic word-alignment across pairs of languages. We find strong correlations between all the measures, illustrating that both expert judgements and automated approaches converge to similar complexity ratings, and can be used interchangeably.
Authors like Fillmore 1986 and Goldberg 2006 have made a strong case for regarding argument omission in English as a lexical and construction-based affordance rather than one based on general semantico-pragmatic constraints. They do not, however, address the question of how grammatical restrictions on null complementation might interact with broader narrative conventions, in particular those of genre. In this paper, we attempt to remedy this oversight by presenting a comprehensive overview of genre-based argument omissions and offering a construction-based analysis of genre-based omission conventions. We consider five genre-based omission types: instructional imperatives (Culy 1996, Bender 1999), labelese, diary style (Haegeman 1990), match reports (Ruppenhofer 2004) and quotative clauses. We show that these omission types share important traits; all, for example, have anaphoric rather than indefinite construals. We also show, however, that the omission types differ from each other in idiosyncratic ways. We then address several interrelated representational problems posed by the grammatical treatment of genre-based omissions. For example, the constructions that represent genre-based omission conventions must interact with the lexical entries of verbs, many of which do not generally permit omitted arguments. Accordingly, we offer constructional analyses of genre-based omissions that allow constructions to override lexical valence constraints.
We present an implemented XML data model and a new, simplified query language for multi-level annotated corpora. The new query language involves automatic conversion of queries into the underlying, more complicated MMAXQL query language. It supports queries for sequential and hierarchical, but also associative (e.g. coreferential) relations. The simplified query language has been designed with non-expert users in mind.
So far, there have been few descriptions on creating structures capable of storing lexicographic data, ISO 24613:2008 being one of the latest. Another one is by Spohr (2012), who designs a multifunctional lexical resource which is able to store data of different types of dictionaries in a user-oriented way. Technically, his design is based on the principle of a hierarchical XML/OWL (eXtensible Markup Language/Web Ontology Language) representation model. This article follows another route in describing a model based on entities and relations between them; MySQL (usually referred to as: Structured Query Language) describes a database system of tables containing data and definitions of relations between them. The model was developed in the context of the project "Scientific eLexicography for Africa" and the lexicographic database to be built thereof will be implemented with MySQL. The principles of the ISO model and of Spohr's model are adhered to with one major difference in the implementation strategy: we do not place the lemma in the centre of attention, but the sense description — all other elements, including the lemma, depend on the sense description. This article also describes the contained lexicographic data sets and how they have been collected from different sources. As our aim is to compile several prototypical internet dictionaries (a monolingual Northern Sotho dictionary, a bilingual learners' Xhosa–English dictionary and a bilingual Zulu–English dictionary), we describe the necessary microstructural elements for each of them and which principles we adhere to when designing different ways of accessing them. We plan to make the model and the (empty) database with all graphical user interfaces that have been developed, freely available by mid-2015.
We present a gold standard for semantic relation extraction in the food domain for German. The relation types that we address are motivated by scenarios for which IT applications present a commercial potential, such as virtual customer advice in which a virtual agent assists a customer in a supermarket in finding those products that satisfy their needs best. Moreover, we focus on those relation types that can be extracted from natural language text corpora, ideally content from the internet, such as web forums, that are easy to retrieve. A typical relation type that meets these requirements are pairs of food items that are usually consumed together. Such a relation type could be used by a virtual agent to suggest additional products available in a shop that would potentially complement the items a customer has already in their shopping cart. Our gold standard comprises structural data, i.e. relation tables, which encode relation instances. These tables are vital in order to evaluate natural language processing systems that extract those relations.
This paper describes a new approach to improve the analysis and categorization of web documents using statistical methods for template based clustering as well as semantical analysis based on terminological ontologies. A domain-specific environment serves for prove of concept. In order to demonstrate the widespread practical benefit of our approach, we outline a combined mathematical and semantical framework for information retrieval on internet resources.
We apply a decision tree based approach to pronoun resolution in spoken dialogue. Our system deals with pronouns with NP- and non-NP-antecedents. We present a set of features designed for pronoun resolution in spoken dialogue and determine the most promising features. We evaluate the system on twenty Switchboard dialogues and show that it compares well to Byron’s (2002) manually tuned system.
Creating and maintaining metadata for various kinds of resources requires appropriate tools to assist the user. The paper presents the metadata editor ProFormA for the creation and editing of CMDI (Component Metadata Infrastructure) metadata in web forms. This editor supports a number of CMDI profiles currently being provided for different types of resources. Since the editor is based on XForms and server-side processing, users can create and modify CMDI files in their standard browser without the need for further processing. Large parts of ProFormA are implemented as web services in order to reuse them in other contexts and programs.
The understanding of story variation, whether motivated by cultural currents or other factors, is important for applications of formal models of narrative such as story generation or story retrieval. We present the first stage of an experiment to elicit natural narrative variation data suitable for evaluation with respect to story similarity, to qualitative and quantitative analysis of story variation, and also for data processing. We also present few preliminary results from the first stage of the experiment, using Red Riding Hood and Romeo and Juliet as base texts.