Refine
Year of publication
Document Type
- Part of a Book (23)
- Article (7)
- Conference Proceeding (7)
- Other (2)
Keywords
- Korpus <Linguistik> (12)
- Deutsch (11)
- Computerunterstützte Kommunikation (8)
- Chatten <Kommunikation> (6)
- Computerunterstützte Lexikografie (6)
- Computerunterstütztes Informationssystem (6)
- Hypertext (6)
- Internet (6)
- Neue Medien (5)
- Annotation (4)
Publicationstate
- Veröffentlichungsversion (39) (remove)
Reviewstate
- (Verlags)-Lektorat (23)
- Peer-Review (12)
- Peer-review (2)
- Review-Status-unbekannt (1)
Publisher
The paper reports the results of the curation project ChatCorpus2CLARIN. The goal of the project was to develop a workflow and resources for the integration of an existing chat corpus into the CLARIN-D research infrastructure for language resources and tools in the Humanities and the Social Sciences (http://clarin-d.de). The paper presents an overview of the resources and practices developed in the project, describes the added value of the resource after its integration and discusses, as an outlook, to what extent these practices can be considered best practices which may be useful for the annotation and representation of other CMC and social media corpora.
This paper describes the lexical database tool LOLA (Linguistic-Oriented Lexical database Approach) which has been developed for the construction and maintenance of lexicons for the machine translation system LMT. First, the requirements such a tool should meet are discussed, then LMT and the lexical information it requires, and some issues concerning vocabulary acquisition are presented. Afterwards the architecture and the components of the LOLA system are described and it is shown how we tried to meet the requirements worked out earlier. Although LOLA originally has been designed and implemented for the German-English LMT prototype, it aimed from the beginning at a representation of lexical data that can be reused for other LMT or MT prototypes or even other NLP applications. A special point of discussion will therefore be the adaptability of the tool and its components as well as the reusability of the lexical data stored in the database for the lexicon development for LMT or for other applications.
The paper presents best practices and results from projects dedicated to the creation of corpora of computer-mediated communication and social media interactions (CMC) from four different countries. Even though there are still many open issues related to building and annotating corpora of this type, there already exists a range of tested solutions which may serve as a starting point for a comprehensive discussion on how future standards for CMC corpora could (and should) be shaped like.
The paper reports on the results of a scientific colloquium dedicated to the creation of standards and best practices which are needed to facilitate the integration of language resources for CMC stemming from different origins and the linguistic analysis of CMC phenomena in different languages and genres. The key issue to be solved is that of interoperability – with respect to the structural representation of CMC genres, linguistic annotations metadata, and anonymization/pseudonymization schemas. The objective of the paper is to convince more projects to partake in a discussion about standards for CMC corpora and for the creation of a CMC corpus infrastructure across languages and genres. In view of the broad range of corpus projects which are currently underway all over Europe, there is a great window of opportunity for the creation of standards in a bottom-up approach.
Converting and Representing Social Media Corpora into TEI: Schema and best practices from CLARIN-D
(2016)
The paper presents results from a curation project within CLARIN-D, in which an existing lMWord corpus of German chat communication has been integrated into the DEREKO and DWDS corpus infrastructures of the CLARIN-D centres at the Institute for the German Language (IDS, Mannheim) and at the Berlin-Brandenburg Academy of Sciences (BBAW, Berlin). The focus is on the solutions developed for converting and representing the corpus in a TEI format.
GrammIs ist ein multimediales Informationssystem zur deutschen Grammatik, das seit Mitte 1993 am Institut für deutsche Sprache (IDS) entwickelt wird. Der vorliegende Aufsatz skizziert zunächst die Architektur des Informationssystems und diskutiert die Vorteile eines solchen Systems im Vergleich zur traditionellen Buchform. Anschließend wird gezeigt, wie bei der Konzeption und Entwicklung des Prototypen GrammIs-1 versucht wurde, durch methodisch reflektierte Konversion des Ausgangstextes, durch die Verwendung intuitiv eingänglicher Benutzermetaphern und durch Navigationsangebote, die sich flexibel auf die unterschiedliche Computererfahrung verschiedener Benutzer einstellen, eine einfach bedienbare Hypermedia-Anwendung zu entwickeln, die im Vergleich zum grammatischen Ausgangstext tatsächlich den vielbeschworenen „informationellen Mehrwert“ aufweist.