Refine
Year of publication
- 2011 (27) (remove)
Document Type
- Conference Proceeding (27) (remove)
Has Fulltext
- yes (27)
Keywords
- Computerlinguistik (6)
- Datenmanagement (4)
- Metadaten (4)
- Online-Wörterbuch (4)
- Automatische Sprachanalyse (3)
- Computerunterstützte Lexikographie (3)
- Korpus <Linguistik> (3)
- Maschinelles Lernen (3)
- Sentimentanalyse (3)
- Benutzer (2)
Publicationstate
Reviewstate
- Peer-Review (16)
- (Verlags)-Lektorat (1)
Publisher
- Trojina, Institute for Applied Slovene Studies (5)
- Incoma Ltd. (2)
- Universität Hamburg (2)
- Universität Hamburg - Sonderforschungsbereich 538 (2)
- Association for Computational (1)
- Association for Computational Linguistics (1)
- Bozen University Press (1)
- City University of Hong Kong (1)
- Editorial Universitat Politècnica de València (1)
- ICOMANIA Ltd. (1)
Scientific interest in von Kempelen's 'speaking machine' stems mainly from a general interest in the history of science. This study, however, is devoted to the question of what relevance the 'speaking machine' has today. Apart for discussing why it fascinates researchers and non-researchers alike we describe the potential of replicas as an instrument for demonstration and for researching speech generation.
What makes a good online dictionary? Empirical insights from an interdisciplinary research project
(2011)
This paper presents empirical fmdings from two online surveys on the use of online dictionaries, in which more than 1,000 participants took part. The aim of these studies was to clarify general questions of online dictionary use (e.g. which electronic devices are used for online dictionaries or different types of usage situations) and to identify different demands regarding the use of online dictionaries. We will present some important results ofthis ongoing research project by focusing on the latter. Our analyses show that neither knowledge of the participants’ (scientific or academic) background, nor the language Version of the online survey (German vs. English) allow any significant conclusions to be drawn about the participant’s individual user demands. Subgroup analyses only reveal noteworthy differences when the groups are clustered statistically. Taken together, our fmdings shed light on the general lexicographical request both for the development of a user-adaptive interface and the incorporation of multimedia elements to make online dictionaries more user-friendly and innovative.
vernetziko is an assistive software tool primarily designed for managing cross-references in XML-based electronic dictionaries. In its current form it has been developed as an integral part of the lexicographic editing environment for the German monolingual dictionary elexiko developed and compiled at the Institut für Deutsche Sprache, Mannheim. This paper first briefly outlines how vernetziko fits into the XML-based dictionary editing technology of elexiko. Then vernetziko’s core functionality and some of the auxiliary tools integrated into the program are presented from both a practical and a technological point of view. The concluding sections discuss some software engineering aspects of extending the tool to handle cross-references between multiple resources and point out some of the advantages of vernetziko vis-à-vis corresponding features of proprietary dictionary writing systems. The software can be adapted to interconnect off-the-shelf components (database management systems and editors), thus providing a tailor-made lexicographical workbench for a wide range of XML-based dictionaries without vendor lock-in.
Compared with printed dictionaries, online dictionaries provide a number of unique possibilities for the presentation and processing of lexicographical information. However, in Müller-Spitzer/Koplenig/Töpel (2011) we show that – on average - users tend to rate the special characteristics of online dictionaries (e.g. multimedia, adaptability) as (partly) unimportant. This result conflicts somewhat with the lexicographical request both for the development of a user-adaptive interface and the incorporation of multimedia elements. This contribution seeks to explain this discrepancy, by arguing that when potential users are fully informed about the benefits of possible innovative features of online dictionaries, they will come to judge these characteristics to be more useful than users that do not have this kind of information. This argument is supported by empirical evidence presented in this paper.
Linguistics is facing the challenge of many other sciences as it continues to grow into increasingly complex subfields, each with its own separate or overarching branches. While linguists are certainly aware of the overall structure of the research field, they cannot follow all developments other than those of their subfields. It is thus important to help specialists but also newcomers alike to bushwhack through evolved or unknown territory of linguistic data. A considerable amount of research data in linguistics is described with metadata. While studies described and published in archived journals and conference proceedings receive a quite homogeneous set of metadata tags — e.g., author, title, publisher —, this does not hold for the empirical data and analyses that underlie such studies. Moreover, lexicons, grammars, experimental data, and other types of resources come in different forms; and to make things worse, their description in terms of metadata is also not uniform, if existing at all. These problems are well-known and there are now a number of international initiatives — e.g., CLARIN, FlareNet, MetaNet, DARIAH — to build infrastructures for managing linguistic resources. The NaLiDa project, funded by the German Research Foundation, aims at facilitating the management and access to linguistic resources originating from German research institutions. In cooperation with the German SFB 833 research center, we are developing a combination of faceted and full-text search to give integrated access through heterogeneous metadata sets. Our approach is supported by a central registry for metadata field descriptors, and a component repository for structured groups of data categories as larger building blocks.
Digital or electronic lexicography has gained in importance in the last few years. This can be seen in the increasing number of online dictionaries and publications focusing on this field. OBELEX (http://www.owid.de) - one of the bibliographic projects of the Institute for German Language in Mannheim - takes this development into account and makes both online dictionaries and research contribulions available in a bibliographical database searchable by different criteria. The idea for OBELEX originated in the context of the dictionary portal OWID, which incorporates several dictionaries from the Institute for German Language (http://www.owid.de). OBELEX has been available online free of Charge since December 2008. As of 2011, OBELEX includes two search options: a search for research literature and (as a completely new feature) a search for online dictionaries, a Service which is unique in the world.
In this paper, we investigate the role of predicates in opinion holder extraction. We will examine the shape of these predicates, investigate what relationship they bear towards opinion holders, determine what resources are potentially useful for acquiring them, and point out limitations of an opinion holder extraction system based on these predicates. For this study, we will carry out an evaluation on a corpus annotated with opinion holders. Our insights are, in particular, important for situations in which no labelled training data are available and only rule-based methods can be applied.
The planning of a dictionary should consider both theoretical and empiric aspects, either for its macro- and microstructure: this is true also for Online Specialized Dictionaries of Linguistics. In particular the microstructure should be standardized and structured so as to fit with the primary and secondary functions of a dictionary. Unfortunately, empirical studies that investigate Online Specialized Dictionaries of Linguistics are rare, making it unclear which microstructural elements are obligatory and which are facultative. This article will present and comment upon the results of an investigation into a corpus of Online Specialized Dictionaries of Linguistics, focusing attention on these aspects and also the most important theoretical issues. An example taken from DIL, a German-Italian Online Dictionary of Linguistics, will end the article.
Der Beitrag betrachtet lexikalisch-semantische Relationen aus einer emergentistischen Perspektive vor dem Hintergrund eines korpusgeleiteten empirisch-linguistischen Ansatzes. Er skizziert, wie eine systematische Erfassung und Auswertung des Kookkurrenzverhaltens von Lexemen – die Analyse der Ahnlichkeit von Kookkurrenzprofilen mit Hilfe von selbstorganisierenden lexikalischen Merkmalskarten und ihre im Diskurs verankerte Interpretation – wichtige Einblicke in die Struktur verschiedenartiger Verwendungsaspekte dieser Lexeme einschlieslich ihrer semantischen Nahe ermoglichen. Die vorgestellte Methodik wird dabei –uber die explorativ-analytischen Zielsetzungen hinaus – als eine abduktive, auf Theoriebildung zielende Generalisierungsstrategie im postulierten Lexikon-Syntax-Kontinuum verstanden. Zum Schluss werden die Anwendungsmoglichkeiten einiger Komponenten dieser Methodik in der Lexikografie, Lexikologie und Didaktik diskutiert.
In order to automatically extract opinion holders, we propose to harness the contexts of prototypical opinion holders, i.e. common nouns, such as experts or analysts, that describe particular groups of people whose profession or occupation is to form and express opinions towards specific items. We assess their effectiveness in supervised learning where these contexts are regarded as labelled training data and in rule-based classification which uses predicates that frequently co-occur with mentions of the prototypical opinion holders. Finally, we also examine in how far knowledge gained from these contexts can compensate the lack of large amounts of labeled training data in supervised learning by considering various amounts of actually labeled training sets.