Refine
Year of publication
Document Type
- Part of a Book (2536)
- Article (733)
- Book (259)
- Conference Proceeding (137)
- Review (64)
- Other (54)
- Working Paper (23)
- Part of Periodical (15)
- Report (5)
- Course Material (1)
Language
- German (3355)
- English (409)
- Russian (24)
- Multiple languages (13)
- French (10)
- Spanish (9)
- Portuguese (4)
- Italian (2)
- Dutch (1)
- Norwegian (1)
Keywords
- Deutsch (1650)
- Korpus <Linguistik> (402)
- Sprachgebrauch (186)
- Grammatik (180)
- Konversationsanalyse (163)
- Linguistik (152)
- Wörterbuch (152)
- Gesprochene Sprache (148)
- Kommunikation (129)
- Sprache (124)
Publicationstate
- Veröffentlichungsversion (2491)
- Zweitveröffentlichung (1159)
- Postprint (172)
- Erstveröffentlichung (3)
- Ahead of Print (1)
- Verlagsveröffentlichung (1)
Reviewstate
- (Verlags)-Lektorat (3829) (remove)
Publisher
- de Gruyter (1005)
- Institut für Deutsche Sprache (304)
- Narr (279)
- Leibniz-Institut für Deutsche Sprache (IDS) (156)
- Narr Francke Attempto (126)
- Lang (120)
- Niemeyer (115)
- Stauffenburg (56)
- IDS-Verlag (51)
- Winter (50)
Einleitung
(2012)
Ausgehend von der Einsicht, dass Wortbedeutungen (Sememe) als strukturierte Komplexe semantischer Merkmale (SM oder Seme) aufgefasst werden können, wurden in den zurückliegenden Jahren verschiedene Ermittlungs- und Beschreibungsmethoden für die Wortbedeutung vorgeschlagen. Im Folgenden soll sowohl prinzipiell als auch am Beispiel erörtert werden, welche Möglichkeiten und Grenzen sich gegenwärtig für die lexikographische Nutzung der semantischen Merkmal- oder Komponentenanalysen (SMA) bei der Bedeutungserklärung in Gebrauchswörterbüchern der deutschen Gegenwartssprache abzeichnen.
We discovered several recurring errors in the current version of the Europarl Corpus originating both from the web site of the European Parliament and the corpus compilation based thereon. The most frequent error was incompletely extracted metadata leaving non-textual fragments within the textual parts of the corpus files. This is, on average, the case for every second speaker change. We not only cleaned the Europarl Corpus by correcting several kinds of errors, but also aligned the speakers’ contributions of all available languages and compiled every- thing into a new XML-structured corpus. This facilitates a more sophisticated selection of data, e.g. querying the corpus for speeches by speakers of a particular political group or in particular language combinations.
Based on specific linguistic landmarks in the speech signal, this study investigates pitch level and pitch span differences in English, German, Bulgarian and Polish. The analysis is based on 22 speakers per language (11 males and 11 females). Linear mixed models were computed that include various linguistic measures of pitch level and span, revealing characteristic differences across languages and between language groups. Pitch level appeared to have significantly higher values for the female speakers in the Slavic than the Germanic group. The male speakers showed slightly different results, with only the Polish speakers displaying significantly higher mean values for pitch level than the German males. Overall, the results show that the Slavic speakers tend to have a wider pitch span than the German speakers. But for the linguistic measure, namely for span between the initial peaks and the non-prominent valleys, we only find the difference between Polish and German speakers. We found a flatter intonation contour in German than in Polish, Bulgarian and English male and female speakers and differences in the frequency of the landmarks between languages. Concerning “speaker liveliness” we found that the speakers from the Slavic group are significantly livelier than the speakers from the Germanic group.
This study investigates cross-language differences in pitch range and variation in four languages from two language groups: English and German (Germanic) and Bulgarian and Polish (Slavic). The analysis is based on large multi-speaker corpora (48 speakers for Polish, 60 for each of the other three languages). Linear mixed models were computed that include various distributional measures of pitch level, span and variation, revealing characteristic differences across languages and between language groups. A classification experiment based on the relevant parameter measures (span, kurtosis and skewness values for pitch distributions for each speaker) succeeded in separating the language groups.
Zur Kontextualisierung von sozialen Kategorien und Stereotypen in der sprachlichen Interaktion
(1995)
Wortbegriff und Orthographie
(1980)
TEI Feature Structures as a Representation Format for Multiple Annotation and Generic XML Documents
(2009)
Feature structures are mathematical entities (rooted labeled directed acyclic graphs) that can be represented as graph displays, attribute value matrices or as XML adhering to the constraints of a specialized TEI tag set. We demonstrate that this latter ISO-standardized format can be used as an integrative storage and exchange format for sets of multiple annotation XML documents. This specific domain of application is rooted in the approach of multiple annotations, which marks a possible solution for XML-compliant markup in scenarios with conflicting annotation hierarchies. A more extreme proposal consists in the possible use as a meta-representation format for generic XML documents. For both scenarios our strategy concerning pertinent feature structure representations is grounded on the XDM (XQuery 1.0 and XPath 2.0 Data Model). The ubiquitous hierarchical and sequential relationships within XML documents are represented by specific features that take ordered list values. The mapping to the TEI feature structure format has been implemented in the form of an XSLT 2.0 stylesheet. It can be characterized as exploiting aspects of both the push and pull processing paradigm as appropriate. An indexing mechanism is provided with regard to the multiple annotation documents scenario. Hence, implicit links concerning identical primary data are made explicit in the result format. In comparison to alternative representations, the TEI-based format does well in many respects, since it is both integrative and well-formed XML. However, the result documents tend to grow very large depending on the size of the input documents and their respective markup structure. This may also be considered as a downside regarding the proposed use for generic XML documents. On the positive side, it may be possible to achieve a hookup to methods and applications that have been developed for feature structure representations in the fields of (computational) linguistics and knowledge representation.