Refine
Year of publication
- 2016 (44) (remove)
Document Type
- Conference Proceeding (44) (remove)
Is part of the Bibliography
- no (44) (remove)
Keywords
Publicationstate
- Veröffentlichungsversion (37)
- Postprint (1)
- Zweitveröffentlichung (1)
Reviewstate
- (Verlags)-Lektorat (13)
- Peer-Review (12)
Publisher
- European Language Resources Association (ELRA) (7)
- Nisaba (5)
- Association for Computational Linguistics (4)
- Ivane Javakhishvili Tbilisi State University (3)
- CLARIN (2)
- European Language Resources Association (2)
- International Speech Communication Association (2)
- Universität Potsdam (2)
- Academic Publishing Division of the Faculty of Arts of the University of Ljubljana (1)
- Association pour l'Avancement des Etudes Iraniennes (1)
The wdlpOst dictionary writing system to be presented in this paper has been developed for the specific purposes of a lexicographical project on German loanwords in the East Slavic languages Russian, Belarusian, and Ukrainian. The project’s main objectives are (i) to document those loanwords for which a cognate lexical borrowing from German is known in Polish and (ii) to establish possible borrowing pathways for these lexical items. In the first phase of the project, the collaborative client/server architecture of the wdlpOst system has been used for excerpting detailed lexicographical information from a large range of historical and contemporary East Slavic dictionaries, taking the entries in a large dictionary of German loanwords in Polish as a common frame of reference. For the project’s second phase, the wdlpOst system provides innovative tooling for compiling entries of the East Slavic loanwords. Most importantly, the numerous word sense definitions for a set of cognate loanwords, as excerpted from different lexicographical sources, are mapped onto a system of newly defined cross-language word senses; in a similar vein, the phonemic and graphemic variation in the loanwords and their derivatives is captured through a tool that abstracts from dictionary-specific idiosyncrasies.
Researchers in Natural Language Processing rely on availability of data and software, ideally under open licenses, but little is done to actively encourage it. In fact, the current Copyright framework grants exclusive rights to authors to copy their works, make them available to the public and make derivative works (such as annotated language corpora). Moreover, in the EU databases are protected against unauthorized extraction and re-utilization of their contents. Therefore, proper public licensing plays a crucial role in providing access to research data. A public license is a license that grants certain rights not to one particular user, but to the general public (everybody). Our article presents a tool that we developed and whose purpose is to assist the user in the licensing process. As software and data should be licensed under different licenses, the tool is composed of two separate parts: Data and Software. The underlying logic as well as elements of the graphic interface are presented below.
The Perceptual Effect of L1 Prosody Transplantation on L2 Speech: The Case of French Accented German
(2016)
Research has shown that language learners are not only challenged by segmental differences between their native language (L1) and the second language (L2). They also have problems with the correct production of suprasegmental structures, like phone/syllable duration and the realization of pitch. These difficulties often lead to a perceptible foreign accent. This study investigates the influence of prosody transplantation on foreign accent ratings. Syllable duration and pitch contour were transferred from utterances of a male and female German native speaker to utterances of ten French native speakers speaking German. Acoustic measurements show that French learners spoke with a significantly lower speaking rate. As expected, results of a perception experiment judging the accentedness of 1) German native utterances, 2) unmanipulated and 3) manipulated utterances of French learners of German suggest that the transplantation of the prosodic features syllable duration and pitch leads to a decrease in accentedness rating. These findings confirm results found in similar studies investigating prosody transplantation with different L1 and L2 and provide a beneficial technique for (computer-assisted) pronunciation training.
The current paper presents a corpus containing 35 dialogues of spontaneously spoken southern German, including half an hour of articulography for 13 of the speakers. Speakers were seated in separate recording chambers, mimicking a telephone call, and recorded on individual audio channels. The corpus provides manually corrected word boundaries and automatically aligned segment boundaries. Annotations are provided in the Praat format. In addition to audio recordings, speakers filled out a detailed questionnaire, assessing among others their audio-visual consumption habits.
The IFCASL corpus is a French-German bilingual phonetic learner corpus designed, recorded and annotated in a project on individualized feedback in computer-assisted spoken language learning. The motivation for setting up this corpus was that there is no phonetically annotated and segmented corpus for this language pair of comparable of size and coverage. In contrast to most learner corpora, the IFCASL corpus incorporate data for a language pair in both directions, i.e. in our case French learners of German, and German learners of French. In addition, the corpus is complemented by two sub-corpora of native speech by the same speakers. The corpus provides spoken data by about 100 speakers with comparable productions, annotated and segmented on the word and the phone level, with more than 50% manually corrected data. The paper reports on inter-annotator agreement and the optimization of the acoustic models for forced speech-text alignment in exercises for computer-assisted pronunciation training. Example studies based on the corpus data with a phonetic focus include topics such as the realization of /h/ and glottal stop, final devoicing of obstruents, vowel quantity and quality, pitch range, and tempo.
We examine different features and classifiers for the categorization of opinion words into actor and speaker view. To our knowledge, this is the first comprehensive work to address sentiment views on the word level taking into consideration opinion verbs, nouns and adjectives. We consider many high-level features requiring only few labeled training data. A detailed feature analysis produces linguistic insights into the nature of sentiment views. We also examine how far global constraints between different opinion words help to increase classification performance. Finally, we show that our (prior) word-level annotation correlates with contextual sentiment views.
The compilation of terminological vocabularies plays a central role in the organization and retrieval of scientific texts. Both simple keyword lists as well as sophisticated modellings of relationships between terminological concepts can make a most valuable contribution to the analysis, classification, and finding of appropriate digital documents, either on the Web or within local repositories. This seems especially true for long-established scientific fields with various theoretical and historical branches, such as linguistics, where the use of terminology within documents from different origins is sometimes far from being consistent. In this short paper, we report on the early stages of a project that aims at the re-design of an existing domain-specific KOS for grammatical content grammis. In particular, we deal with the terminological part of grammis and present the state-of-the-art of this online resource as well as the key re-design principles. Further, we propose questions regarding ramifications of the Linked Open Data and Semantic Web approaches for our re-design decisions.