Refine
Year of publication
Document Type
- Conference Proceeding (112) (remove)
Has Fulltext
- yes (112)
Keywords
- Korpus <Linguistik> (46)
- Deutsch (32)
- Annotation (15)
- Auszeichnungssprache (10)
- Computerlinguistik (8)
- Computerunterstützte Lexikographie (8)
- Head-driven phrase structure grammar (8)
- Gesprochene Sprache (6)
- HPSG (6)
- Polnisch (6)
Publicationstate
- Veröffentlichungsversion (112) (remove)
Reviewstate
- (Verlags)-Lektorat (112) (remove)
Publisher
- European Language Resources Association (ELRA) (12)
- Association for Computational Linguistics (5)
- CSLI Publications (5)
- Nisaba (5)
- Extreme Markup Languages Conference (4)
- European Language Resources Association (3)
- Ivane Javakhishvili Tbilisi State University (3)
- University of Birmingham (3)
- University of Illinois (3)
- University of Oulu (3)
Strategische Kommunikation wird in verschiedenen Bereichen der menschlichen Interaktion verwendet, um eine bestimmte Zielgruppe zu beeinflussen. Sie befindet sich an der Schnittstelle mannigfaltiger Disziplinen, wie z.B. Kommunikations- und Politikwissenschaft, Psychologie, Management und Marketing. Strategische Kommunikation bezieht sich sowohl auf öffentliche und private Kommunikation, professionelle und unprofessionelle Kommunikantinnen und Kommunikanten als auch auf unterschiedliche Kommunikationskanäle.
Der Beitrag befasst sich zunächst mit der Satzklammer des Deutschen aus der Perspektive der Informationsverteilung. Nachdem gezeigt ist, dass sie als Informationsklammer fungiert, wird ihre Interaktion mit den Teilen gespaltener Nominalphrasen untersucht. Dabei zeigen sich zwei interessante Befunde:
• die Satzklammer und die NP-Teile unterstützen sich bei der Informationsklammerbildung; insbesondere können die Spalt-NP-Teile Akzent tragen;
• die Spalt-NP-Teile können alleine die Rolle einer Informationsklammer spielen, wodurch eine Topikalisierung des Partizips II möglich wird.
The Stuttgart-Tübingen Tagset (STTS) is a widely used POS annotation scheme for German which provides 54 different tags for the analysis on the part of speech level. The tagset, however, does not distinguish between adverbs and different types of particles used for expressing modality, intensity, graduation, or to mark the focus of the sentence. In the paper, we present an extension to the STTS which provides tags for a more fine-grained analysis of modification, based on a syntactic perspective on parts of speech. We argue that the new classification not only enables us to do corpus-based linguistic studies on modification, but also improves statistical parsing. We give proof of concept by training a data-driven dependency parser on data from the TiGer treebank, providing the parser a) with the original STTS tags and b) with the new tags. Results show an improved labelled accuracy for the new, syntactically motivated classification.
The present paper reports the first results of the compilation and annotation of a blog corpus for German. The main aim of the project is the representation of the blog discourse structure and relations between its elements (blog posts, comments) and participants (bloggers, commentators). The data included in the corpus were manually collected from the scientific blog portal SciLogs. The feature catalogue for the corpus annotation includes three types of information which is directly or indirectly provided in the blog or can be construed by means of statistical analysis or computational tools. At this point, only directly available information (e.g. title of the blog post, name of the blogger etc.) has been annotated. We believe, our blog corpus can be of interest for the general study of blog structure or related research questions as well as for the development of NLP methods and techniques (e.g. for authorship detection).
In this paper we present an approach to faceted search in large language resource repositories. This kind of search which enables users to browse through the repository by choosing their personal sequence of facets heavily relies on the availability of descriptive metadata for the objects in the repository. This approach therefore informs the collection of a minimal set of metatdata for language resources. The work described in this paper has been funded by the EC within the ESFRI infrastructure project CLARIN.
This paper presents an extension to the Stuttgart-Tübingen TagSet, the standard part-of-speech tag set for German, for the annotation of spoken language. The additional tags deal with hesitations, backchannel signals, interruptions, onomatopoeia and uninterpretable material. They allow one to capture phenomena specific to spoken language while, at the same time, preserving inter-operability with already existing corpora of written language.
Here we will present a graphical software tool called Morph Moulder (MoMo) for teaching the formal foundations of a language with a denotation in a domain of relational typed feature structures as used in Head-Driven Phrase Structure Grammar. With MoMo, students learn the properties of totally well-typed, sort resolved relational feature structures, the use of formal languages to describe typed feature structures and the notions of constraint satisfaction and models of grammars written in a formal language. MoMo was realized and conceived within the context of a set of courses in the format of web-based training, that focuses on the concept of typed feature structures in a curriculum in grammar formalisms and parsing. The formal language of MoMo amends the constraint language of TRALE (an implementation platform for HPSG grammars based on ALE) to accommodate the expressive power of HPSG.
Ein Defizit der lexikographischen Methodologie liegt in der fehlenden Berücksichtigung der historischen, sozialen und politischen Gebundenheit von Wörterbüchern vor, obwohl die Wörterbuchkritik seit dem 19. Jh. immer wieder darauf aufmerksam gemacht hat. In der Perspektive der Benutzer besitzen Wörterbücher eine aspektenreiche kulturelle Semiotik, die mit dem hermeneutischen Charakter lexikologisch-lexikographischen Arbeitens zusammenhängt. Ausgehend vom Modell der Hermeneutik wird dafür plädiert, »Verstehenskompetenz« anstelle von »Sprachkompetenz« (des Linguisten) als Kategorie in die Theorie der Lexikographie einzuführen.
In this contribution, we report on an effort to annotate German data with information relevant to opinion inference. Such information has previously been referred to as effect or couched in terms of eventevaluation functors. We extend the theory and present an extensive scheme that combines both approaches and thus extends the set of inference-relevant predicates. Using these guidelines to annotate 726 German synsets, we achieve good inter-annotator agreement.
This paper provides a new generation of a markup language by introducing the Freestyle Markup Language (FML). Demands placed on the language are elaborated, considering current standards and discussions. Conception, a grammatical definition, a corresponding object graph and the bi-directional unambiguous transformation between these two congruent representation forms are set up. The result of this paper is a fundamental definition of a completely new markup language, consolidating many deficiency-discourses and experiences into one particular implementation concept, encouraging the evolution of markup.
We discovered several recurring errors in the current version of the Europarl Corpus originating both from the web site of the European Parliament and the corpus compilation based thereon. The most frequent error was incompletely extracted metadata leaving non-textual fragments within the textual parts of the corpus files. This is, on average, the case for every second speaker change. We not only cleaned the Europarl Corpus by correcting several kinds of errors, but also aligned the speakers’ contributions of all available languages and compiled every- thing into a new XML-structured corpus. This facilitates a more sophisticated selection of data, e.g. querying the corpus for speeches by speakers of a particular political group or in particular language combinations.
Based on specific linguistic landmarks in the speech signal, this study investigates pitch level and pitch span differences in English, German, Bulgarian and Polish. The analysis is based on 22 speakers per language (11 males and 11 females). Linear mixed models were computed that include various linguistic measures of pitch level and span, revealing characteristic differences across languages and between language groups. Pitch level appeared to have significantly higher values for the female speakers in the Slavic than the Germanic group. The male speakers showed slightly different results, with only the Polish speakers displaying significantly higher mean values for pitch level than the German males. Overall, the results show that the Slavic speakers tend to have a wider pitch span than the German speakers. But for the linguistic measure, namely for span between the initial peaks and the non-prominent valleys, we only find the difference between Polish and German speakers. We found a flatter intonation contour in German than in Polish, Bulgarian and English male and female speakers and differences in the frequency of the landmarks between languages. Concerning “speaker liveliness” we found that the speakers from the Slavic group are significantly livelier than the speakers from the Germanic group.
This study investigates cross-language differences in pitch range and variation in four languages from two language groups: English and German (Germanic) and Bulgarian and Polish (Slavic). The analysis is based on large multi-speaker corpora (48 speakers for Polish, 60 for each of the other three languages). Linear mixed models were computed that include various distributional measures of pitch level, span and variation, revealing characteristic differences across languages and between language groups. A classification experiment based on the relevant parameter measures (span, kurtosis and skewness values for pitch distributions for each speaker) succeeded in separating the language groups.
TEI Feature Structures as a Representation Format for Multiple Annotation and Generic XML Documents
(2009)
Feature structures are mathematical entities (rooted labeled directed acyclic graphs) that can be represented as graph displays, attribute value matrices or as XML adhering to the constraints of a specialized TEI tag set. We demonstrate that this latter ISO-standardized format can be used as an integrative storage and exchange format for sets of multiple annotation XML documents. This specific domain of application is rooted in the approach of multiple annotations, which marks a possible solution for XML-compliant markup in scenarios with conflicting annotation hierarchies. A more extreme proposal consists in the possible use as a meta-representation format for generic XML documents. For both scenarios our strategy concerning pertinent feature structure representations is grounded on the XDM (XQuery 1.0 and XPath 2.0 Data Model). The ubiquitous hierarchical and sequential relationships within XML documents are represented by specific features that take ordered list values. The mapping to the TEI feature structure format has been implemented in the form of an XSLT 2.0 stylesheet. It can be characterized as exploiting aspects of both the push and pull processing paradigm as appropriate. An indexing mechanism is provided with regard to the multiple annotation documents scenario. Hence, implicit links concerning identical primary data are made explicit in the result format. In comparison to alternative representations, the TEI-based format does well in many respects, since it is both integrative and well-formed XML. However, the result documents tend to grow very large depending on the size of the input documents and their respective markup structure. This may also be considered as a downside regarding the proposed use for generic XML documents. On the positive side, it may be possible to achieve a hookup to methods and applications that have been developed for feature structure representations in the fields of (computational) linguistics and knowledge representation.
In this paper, we provide an analysis of temporality in Hausa (Chadic, Afro-Asiatic). By testing the hypothesis of covert tense (Matthewson 2006) against empirical data, we show that Hausa is genuinely tenseless in the sense that the grammar does not restrict the relation between reference time and utterance time. Rather, temporal reference is pragmatically inferred from aspectual and contextual information. We also argue that future time reference in Hausa is realized as a combination of a modal operator and a prospective aspect, thus involving the modal meaning components of intention and prediction as well as event time shifting.
This paper provides a lexicalist formal description of preposition-pronoun contraction (PPC) in Polish, using the theoretical framework of HPSG. Considering the behaviour of PPC with respect to the prosodic, categorial, syntactic and semantic properties, the assumption can be made that each PPC is a morphological unit with prepositional status. The crucial difference between a PPC and a typical preposition consists, besides the phonological form, in the valence properties. While a typical preposition realizes its complement externally via general constraints on phrase structure, the realization of a PPC argument is effected internally by virtue of its lexical entry. Here, we will provide the appropriate implicational lexical constraints that license both typical Ps and PPCs.
This paper presents a thorough examination of the validity of three evaluation measures on parser output. We assess parser performance of an unlexicalised probabilistic parser trained on two German treebanks with different annotation schemes and evaluate parsing results using the PARSEVAL metric, the Leaf-Ancestor metric and a dependency-based evaluation. We reject the claim that the TüBa-D/Z annotation scheme is more adequate then the TIGER scheme for PCFG parsing and show that PARSEVAL should not be used to compare parser performance for parsers trained on treebanks with different annotation schemes. An analysis of specific error types indicates that the dependency-based evaluation is most appropriate to reflect parse quality.
This paper provides a treatment of Polish Plural Comitative Constructions in the paradigm of HPSG in the tradition of Pollard and Sag (1994). Plural Comitative Constructions (PCCs) have previously been treated in terms of coordination, complementation and adjunction. The objective of this paper is to show that PCCs are neither instances of typical coordinate structures nor of typical complement or adjunct structures. It thus appears difficult to properly describe them by means of the standard principles of syntax and semantics. The analysis proposed in this paper accounts for the syntactic and semantic properties of PCCs in Polish by assuming an adjunction-based syntactic structure for PCCs, and by treating the indexical information provided by PCCs not as subject to any inheritance or composition, but as a result of applying a set of principles on number, gender and person resolution that also hold for ordinary coordinate structures.
In this paper, semantic aspects of P1N1P2 word sequences will be discussed. Based on syntactic analysis of Trawinski (2003), which assumes prepositions heading P1N1P2NP combinations to be able to raise and realize syntactically complements of their arguments, we will investigate whether semantic representation of these expressions can be considered as an instance of the combinatorics semantics. We will investigate three German PPs involving expressions under consideration with respect to two criteria of internal semantic regularity adopted from Sailer (2000) and we will observe that the discussed expressions are not uniform with regard to the semantic properties. While the logical form of some of them can be computed by means of ordinary translations and a set of standard derivational operations, the other require additional handling methods. However, there are approaches available within the HPSG paradigm that are suited to account for these data. Here, we will briefly present the external selection approach of Soehn (2003) and the phrasal lexical entries approach of Sailer (2000) and we will show how they interact with the syntactic approach of Trawinski (2003).
Many modern languages commonly use expressions that seem unpredictable regarding standard grammar regularities. Among these expressions, sequences consisting of a preposition, a noun, another preposition, and another noun are particularly frequent. The issue of these expressions, usually termed in linguistic literature as "complex prepositions", "phrasal prepositions" or "preposition-like word formations", can certainly be considered to be a cross-linguistic problem (On "complex prepositions" in German and in other languages see (Benes 1974), (Buscha 1984)}, (Lindqvist 1994), (Meibauer 1995), (Quirk and Mulholland 1964), (Wollmann 1996). In this paper, I will focus exclusively on German data, because they provide very explicit and convincing linguistic evidence which motivates and supports my approach. However, I assert that the analysis proposed here for German can also be applied to other languages such as Polish or English.
One of the most popular techniques used in HPSG-based studies to describe linguistic phenomena is the raising mechanism. Besides ordinary raising verbs or adjectives, this tool has been applied for handling verbal complexes and discontinuous constituents, among other phenomena. In this paper, a new application for raising within the HPSG paradigm will be discussed, thereby investigating data from the prepositional domain. We will analyze linguistic properties of word combinations in German consisting of a preposition, a noun, and another preposition (such as auf Grund von (‘by virtue of’)), thus arguing that raising is the most appropriate method for satisfactorily describing the crucial syntactic features which are typical for those expressions. The objective of this paper is thus to demonstrate the efficiency of the raising mechanism as used in HPSG, and therefore, to emphasize the importance of designing a satisfactory uniform theory of raising within this grammar framework.
In this paper, we will investigate a cross-linguistic phenomenon referred to as complex prepositions (CPs), which is a frequent type of multiword expressions (MWEs) in many languages. Based on empirical data, we will point out the problems of the traditional treatment of CPs as complex lexical categories, and, thus, propose an analysis using the formal paradigm of the HPSG in the tradition of (Pollard and Sag, 1994). Our objective is to provide an approach to CPs which (1) convincingly explains empirical data, (2) is consistent with the underlying formal framework and does not require any extensions or modification of the existing description apparatus, (3) is computationally tractable.
This paper focuses on aspects of the licensing of adverbial noun phrases (AdvNPs) in the HPSG grammar framework. In the first part, empirical issues will be discussed. A number of AdvNPs will be examined with respect to various linguistic phenomena in order to find out to what extent AdvNPs share syntactic and semantic properties with non-adverbial NPs. Based on empirical generalizations, a lexical constraint for licensing both AdvNPs and non-adverbial NPs will be provided. Further on, problems of structural licensing of phrases containing AdvNPs that arise within the standard HPSG framework of Pollard and Sag (1994) will be pointed out, and a possible solution will be proposed. The objective is to provide a constraint-based treatment of NPs which describes non-redundantly both their adverbial and non-adverbial usages. The analysis proposed in this paper applies lexical and phrasal implicational constraints and does not require any radical modifications or extensions of the standard HPSG geometry of Pollard and Sag (1994).
Since adverbial NPs have particularly high frequency and a wide spectrum of uses in inflectional languages such as Polish, we will take Polish data into consideration.
Recent studies focussed on the question whether less-configurational languages like German are harder to parse than English, or whether the lower parsing scores are an artefact of treebank encoding schemes and data structures, as claimed by Kübler et al. (2006). This claim is based on the assumption that PARSEVAL metrics fully reflect parse quality across treebank encoding schemes. In this paper we present new experiments to test this claim. We use the PARSEVAL metric, the Leaf-Ancestor metric as well as a dependency-based evaluation, and present novel approaches measuring the effect of controlled error insertion on treebank trees and parser output. We also provide extensive past-parsing crosstreebank conversion. The results of the experiments show that, contrary to Kübler et al. (2006), the question whether or not German is harder to parse than English remains undecided.
This paper discusses the behaviour of German particle verbs formed by two-way prepositions in combination with pleonastic PPs including the verb particle as a preposition. These particle verbs have a characteristic feature: some of them license directional prepositional phrases in the accusative, some only allow for locative PPs in the dative, and some particle verbs can occur with PPs in the accusative and in the dative. Directional particle verbs together with directional PPs present an additional problem: the particle and the preposition in the PP seem to provide redundant information. The paper gives an overview of the semantic verb classes influencing this phenomenon, based on corpus data, and explains the underlying reasons for the behaviour of the particle verbs. We also show how the restrictions on particle verbs and pleonastic PPs can be expressed in a grammar theory like Lexical Functional Grammar (LFG).
How to Compare Treebanks
(2008)
Recent years have seen an increasing interest in developing standards for linguistic annotation, with a focus on the interoperability of the resources. This effort, however, requires a profound knowledge of the advantages and disadvantages of linguistic annotation schemes in order to avoid importing the flaws and weaknesses of existing encoding schemes into the new standards. This paper addresses the question how to compare syntactically annotated corpora and gain insights into the usefulness of specific design decisions. We present an exhaustive evaluation of two German treebanks with crucially different encoding schemes. We evaluate three different parsers trained on the two treebanks and compare results using EVALB, the Leaf-Ancestor metric, and a dependency-based evaluation. Furthermore, we present TePaCoC, a new testsuite for the evaluation of parsers on complex German grammatical constructions. The testsuite provides a well thought-out error classification, which enables us to compare parser output for parsers trained on treebanks with different encoding schemes and provides interesting insights into the impact of treebank annotation schemes on specific constructions like PP attachment or non-constituent coordination.
We present data-driven methods for the acquisition of LFG resources from two German treebanks. We discuss problems specific to semi-free word order languages as well as problems arising from the data structures determined by the design of the different treebanks. We compare two ways of encoding semi-free word order, as done in the two German treebanks, and argue that the design of the TiGer treebank is more adequate for the acquisition of LFG resources. Furthermore, we describe an architecture for LFG grammar acquisition for German, based on the two German treebanks, and compare our results with a hand-crafted German LFG grammar.
Linguistic query systems are special purpose IR applications. We present a novel state-of-the-art approach for the efficient exploitation of very large linguistic corpora, combining the advantages of relational database management systems (RDBMS) with the functional MapReduce programming model. Our implementation uses the German DEREKO reference corpus with multi-layer linguistic annotations and several types of text-specific metadata, but the proposed strategy is language-independent and adaptable to large-scale multilingual corpora.
Der Beitrag beschäftigt sich mit den verschiedenen Such-, Auffindungs- und Auswahlsprozessen, die für die fremdsprachige Produktion notwendig sind und von DICONALE-online, einem onomasiologisch-konzeptuell ausgerichteten, zweisprachig-bilateral konzipierten Verbwörterbuch der spanischen und deutschen Gegenwartsspache, besonders berücksichtigt werden. Der Ausgangspunkt von DICONALE ist ein unbefriedigendes Informationsangebot in den bestehenden ein- und zweisprachigen Lernerwörterbüchern für den L2-output und bestätigt das Projektteam in der Notwendigkeit, ein neuartiges benutzer- und situationsdefiniertes online-Nachschlagewerk zu erstellen. Zwei Bezugsrahmen bilden die Grundlage für einen komplexen, konzeptuell und framegeleiteten Zugriffspfad, der dem Benutzer bei der Suche und Auswahl von Ausdrucksmöglichkeiten und der adäquaten Anwendung behilflich sein soll. Das Novum dieses Wörterbuchprojekts besteht hauptsachlich darin, eine onomasiologisch-konzeptuelle Perspektive für den fremdsprachigen Produktionsprozess nutzbar zu machen und mit einem semasiologischen Zugriff zu verbinden, durch den es möglich ist, die inter- und intralingualen Unterschiede zwischen den Lexemen eines lexikalisch-semantischen (Sub)Paradigmas hervorzuheben. Ziel des Beitrages ist es daher, den Ausgangspunkt, sowie die theoretischen und methodologischen Grundlagen von DICONALE-online unter der speziellen Perspektive der Benutzer- und Situationsorientiertheit zur Diskussion zu stellen, die einzelnen Zugriffspfade für den Such- und Auffindungsprozess vorzustellen und das Angebot zur Auswahl und zum adäquaten Gebrauch aus inter- und intralingualer Perspektive zu präsentieren.
Präposition-Substantiv-Verbindungen mit rekurrentem Nullartikel in adverbialer Verwendung – z.B. nach Belieben, auf Knopfdruck, ohne Ende oder bei Nacht – sind ein in der Mehrwortforschung bisher eher vernachlässigter Typ. Sie sind Untersuchungsgegenstand des laufenden Forschungsprojekts „Präpositionale Wortverbindungen kontrastiv“ (beteiligte Institutionen: IDS Mannheim, Universität Santiago de Compostela, Universität Trnava), in das wir in unserem Vortrag einen Einblick vermitteln. Es wird skizziert, wie sich solche Wortverbindungen sowie abstraktere präpositionale Wortverbindungsmuster vom Typ [in + SUBX-Zeit(en) (z.B. in Echtzeit, in Krisenzeiten) aus kontrastiver Sicht (Deutsch – Spanisch – Slowakisch) korpusbasiert untersuchen und lexikografisch beschreiben lassen. Von großem Interesse – gerade auch für Fremdsprachenlerner – sind dabei insbesondere die semantisch-funktionalen Restriktionen, denen solche Entitäten unterliegen. Basierend auf den theoretischen und empirischen Grundannahmen des am IDS entwickelten Modells „Usuelle Wortverbindungen“ (vgl. Steyer 2013) werden im Projekt zunächst Kollokations- und Kotextmuster für die binären deutschen Mehrworteinheiten induktiv in sehr großen Korpora ermittelt; im Anschluss werden sie einem systematischen Vergleich mit dem Spanischen und Slowakischen unterzogen. Methodisch greifen wir – in allen drei Sprachen – u.a. auf Kookkurrenzprofile zu den Wortverbindungen sowie auf Slotanalysen zu definierten Suchmustern zurück. Ziel des Projekts ist u.a. die Entwicklung eines neuartigen Prototyps für eine multilinguale Aufbereitung des Untersuchungsgegentands (speziell für Fremdsprachenlerner).
This paper presents the first release of the KiezDeutsch Korpus (KiDKo), a new language resource with multiparty spoken dialogues of Kiezdeutsch, a newly emerging language variety spoken by adolescents from multi-ethnic urban areas in Germany. The first release of the corpus includes the transcriptions of the data as well as a normalisation layer and part-of-speech annotations. In the paper, we describe the main features of the new resource and then focus on automatic POS tagging of informal spoken language. Our tagger achieves an accuracy of nearly 97% on KiDKo. While we did not succeed in further improving the tagger using ensemble tagging, we present our approach to using the tagger ensembles for identifying error patterns in the automatically tagged data.
Annotating Discourse Relations in Spoken Language: A Comparison of the PDTB and CCR Frameworks
(2016)
In discourse relation annotation, there is currently a variety of different frameworks being used, and most of them have been developed and employed mostly on written data. This raises a number of questions regarding interoperability of discourse relation annotation schemes, as well as regarding differences in discourse annotation for written vs. spoken domains. In this paper, we describe ouron annotating two spoken domains from the SPICE Ireland corpus (telephone conversations and broadcast interviews) according todifferent discourse annotation schemes, PDTB 3.0 and CCR. We show that annotations in the two schemes can largely be mappedone another, and discuss differences in operationalisations of discourse relation schemes which present a challenge to automatic mapping. We also observe systematic differences in the prevalence of implicit discourse relations in spoken data compared to written texts,find that there are also differences in the types of causal relations between the domains. Finally, we find that PDTB 3.0 addresses many shortcomings of PDTB 2.0 wrt. the annotation of spoken discourse, and suggest further extensions. The new corpus has roughly theof the CoNLL 2015 Shared Task test set, and we hence hope that it will be a valuable resource for the evaluation of automatic discourse relation labellers.
The wdlpOst dictionary writing system to be presented in this paper has been developed for the specific purposes of a lexicographical project on German loanwords in the East Slavic languages Russian, Belarusian, and Ukrainian. The project’s main objectives are (i) to document those loanwords for which a cognate lexical borrowing from German is known in Polish and (ii) to establish possible borrowing pathways for these lexical items. In the first phase of the project, the collaborative client/server architecture of the wdlpOst system has been used for excerpting detailed lexicographical information from a large range of historical and contemporary East Slavic dictionaries, taking the entries in a large dictionary of German loanwords in Polish as a common frame of reference. For the project’s second phase, the wdlpOst system provides innovative tooling for compiling entries of the East Slavic loanwords. Most importantly, the numerous word sense definitions for a set of cognate loanwords, as excerpted from different lexicographical sources, are mapped onto a system of newly defined cross-language word senses; in a similar vein, the phonemic and graphemic variation in the loanwords and their derivatives is captured through a tool that abstracts from dictionary-specific idiosyncrasies.
Lexicography of Language Contact: An Internet Dictionary of Words of German Origin in Tok Pisin
(2016)
The paper presents an ongoing project in the domain of lexicography of language contact, namely, the “Internet Dictionary of Words of German Origin in Tok Pisin”. The German influence onto the lexicon of the main pidgin language of Papua New Guinea has its roots in the German colonial empire, where Tok Pisin played an important role as a lingua franca in the colony of German New Guinea. Tok Pisin also served as an intermediate language for many borrowing processes; that is, German loans entered many languages in the South Pacific via Tok Pisin. The Internet Dictionary of Words of German Origin in Tok Pisin is based on all available lexicographical sources from the early 20th century up to now. These sources are systematically evaluated within our project; the results will be documented in the dictionary. The microstructure of the dictionary will be presented with respect to its major features: documentation of sources, examples for word usage, audio files, and lexicographic comment.
The Online Bibliography of Electronic Lexicography (OBELEXmeta) is a bibliographic database which is developed for researchers working in the field of dictionary research. The platform is hosted at the Institute for the German Language (Institut für Deutsche Sprache, IDS) in Mannheim. The poster presentation aims at presenting the current status of the ongoing project.