Refine
Year of publication
Document Type
- Conference Proceeding (112) (remove)
Has Fulltext
- yes (112)
Keywords
- Korpus <Linguistik> (46)
- Deutsch (32)
- Annotation (15)
- Auszeichnungssprache (10)
- Computerlinguistik (8)
- Computerunterstützte Lexikographie (8)
- Head-driven phrase structure grammar (8)
- Gesprochene Sprache (6)
- HPSG (6)
- Polnisch (6)
Publicationstate
- Veröffentlichungsversion (112) (remove)
Reviewstate
- (Verlags)-Lektorat (112) (remove)
Publisher
- European Language Resources Association (ELRA) (12)
- Association for Computational Linguistics (5)
- CSLI Publications (5)
- Nisaba (5)
- Extreme Markup Languages Conference (4)
- European Language Resources Association (3)
- Ivane Javakhishvili Tbilisi State University (3)
- University of Birmingham (3)
- University of Illinois (3)
- University of Oulu (3)
This study investigates cross-language differences in pitch range and variation in four languages from two language groups: English and German (Germanic) and Bulgarian and Polish (Slavic). The analysis is based on large multi-speaker corpora (48 speakers for Polish, 60 for each of the other three languages). Linear mixed models were computed that include various distributional measures of pitch level, span and variation, revealing characteristic differences across languages and between language groups. A classification experiment based on the relevant parameter measures (span, kurtosis and skewness values for pitch distributions for each speaker) succeeded in separating the language groups.
Based on specific linguistic landmarks in the speech signal, this study investigates pitch level and pitch span differences in English, German, Bulgarian and Polish. The analysis is based on 22 speakers per language (11 males and 11 females). Linear mixed models were computed that include various linguistic measures of pitch level and span, revealing characteristic differences across languages and between language groups. Pitch level appeared to have significantly higher values for the female speakers in the Slavic than the Germanic group. The male speakers showed slightly different results, with only the Polish speakers displaying significantly higher mean values for pitch level than the German males. Overall, the results show that the Slavic speakers tend to have a wider pitch span than the German speakers. But for the linguistic measure, namely for span between the initial peaks and the non-prominent valleys, we only find the difference between Polish and German speakers. We found a flatter intonation contour in German than in Polish, Bulgarian and English male and female speakers and differences in the frequency of the landmarks between languages. Concerning “speaker liveliness” we found that the speakers from the Slavic group are significantly livelier than the speakers from the Germanic group.
Der Beitrag befasst sich zunächst mit der Satzklammer des Deutschen aus der Perspektive der Informationsverteilung. Nachdem gezeigt ist, dass sie als Informationsklammer fungiert, wird ihre Interaktion mit den Teilen gespaltener Nominalphrasen untersucht. Dabei zeigen sich zwei interessante Befunde:
• die Satzklammer und die NP-Teile unterstützen sich bei der Informationsklammerbildung; insbesondere können die Spalt-NP-Teile Akzent tragen;
• die Spalt-NP-Teile können alleine die Rolle einer Informationsklammer spielen, wodurch eine Topikalisierung des Partizips II möglich wird.
We present an approach on how to investigate what kind of semantic information is regularly associated with the structural markup of scientific articles. This approach addresses the need for an explicit formal description of the semantics of text-oriented XML-documents. The domain of our investigation is a corpus of scientific articles from psychology and linguistics from both English and German online available journals. For our analyses, we provide XML-markup representing two kinds of semantic levels: the thematic level (i.e. topics in the text world that the article is about) and the functional or rhetorical level. Our hypothesis is that these semantic levels correlate with the articles’ document structure also represented in XML. Articles have been annotated with the appropriate information. Each of the three informational levels is modelled in a separate XML document, since in our domain, the different description levels might conflict so that it is impossible to model them within a single XML document. For comparing and mining the resulting multi-layered XML annotations of one article, a Prolog-based approach is used. It focusses on the comparison of XML markup that is distributed among different documents. Prolog predicates have been defined for inferring relations between levels of information that are modelled in separate XML documents. We demonstrate how the Prolog tool is applied in our corpus analyses.
The present article describes the first stage of the KorAP project, launched recently at the Institut für Deutsche Sprache (IDS) in Mannheim, Germany. The aim of this project is to develop an innovative corpus analysis platform to tackle the increasing demands of modern linguistic research. The platform will facilitate new linguistic findings by making it possible to manage and analyse primary data and annotations in the petabyte range, while at the same time allowing an undistorted view of the primary linguistic data, and thus fully satisfying the demands of a scientific tool. An additional important aim of the project is to make corpus data as openly accessible as possible in light of unavoidable legal restrictions, for instance through support for distributed virtual corpora, user-defined annotations and adaptable user interfaces, as well as interfaces and sandboxes for user-supplied analysis applications. We discuss our motivation for undertaking this endeavour and the challenges that face it. Next, we outline our software implementation plan and describe development to-date.
XML-based technologies offer powerful resources for open source applications in the field of e-learning. The paper describes a model of hypertext as interlinked structures that can be intertwined by cross-annotation linking. This infrastructure integrates multiple perspectives and allows creating a personal learning environment. We exemplify the approach in a case study: the Hamlet project. In the course of this project, several German translations of William Shakespeare’s Hamlet have been collected and annotated. Two different annotation layers are used to achieve a cross-linking reference between the various German translations. We will describe the theoretical background of cross-annotation linking and the actual technological implementation of the system. Additionally, we will use the personas method to gain insights into the potential benefit of the system as a personal learning environment.