Refine
Year of publication
Document Type
- Part of a Book (35)
- Article (17)
- Conference Proceeding (13)
- Other (2)
Keywords
- Deutsch (18)
- Korpus <Linguistik> (16)
- Hypertext (12)
- Computerunterstützte Kommunikation (10)
- Computerunterstützte Lexikografie (8)
- Internet (8)
- Computerunterstütztes Informationssystem (7)
- Chatten <Kommunikation> (6)
- Neue Medien (6)
- Textlinguistik (6)
Publicationstate
- Veröffentlichungsversion (39)
- Zweitveröffentlichung (10)
- Postprint (7)
- Preprint (2)
Reviewstate
- (Verlags)-Lektorat (31)
- Peer-Review (23)
- Peer-review (2)
- Review-Status-unbekannt (1)
Publisher
- de Gruyter (9)
- Niemeyer (8)
- De Gruyter (3)
- Narr (3)
- Springer (3)
- Westdeutscher Verlag (3)
- Erich Schmidt (2)
- IKS e.V. (2)
- Lang (2)
- Olms (2)
Das Kommunizieren in Sozialen Medien und der Umgang mit Hypertexten ist im Jahr 2020 kein Randphänomen mehr. Die sprachlichen Besonderheiten internetbasierter Kommunikation und Sozialer Medien sind mittlerweile auch gut erforscht und beschrieben, allerdings werden diese bislang in deutschen Grammatiken, mit Ausnahme von Hoffmann (2014), allenfalls am Rande behandelt. Selbst neuere Ansätze zur Textanalyse, z. B. Ágel (2017), konzentrieren sich auf gestaltstabile, linear organisierte Schrifttexte. Dasselbe gilt für Ansätze, die primär für die Bewertung von Schreibprodukten in Bildungskontexten entwickelt wurden.
Tagset und Richtlinie für das PoSTagging von Sprachdaten aus Genres internetbasierter Kommunikation
(2015)
The paper presents best practices and results from projects in four countries dedicated to the creation of corpora of computer-mediated communication and social media interactions (CMC). Even though there are still many open issues related to building and annotating corpora of that type, there already exists a range of accessible solutions which have been tested in projects and which may serve as a starting point for a more precise discussion of how future standards for CMC corpora may (and should) be shaped like.
The paper presents best practices and results from projects in four countries dedicated to the creation of corpora of computer-mediated communication and social media interactions (CMC). Even though there are still many open issues related to building and annotating corpora of that type, there already exists a range of accessible solutions which have been tested in projects and which may serve as a starting point for a more precise discussion of how future standards for CMC corpora may (and should) be shaped like.
The paper presents best practices and results from projects dedicated to the creation of corpora of computer-mediated communication and social media interactions (CMC) from four different countries. Even though there are still many open issues related to building and annotating corpora of this type, there already exists a range of tested solutions which may serve as a starting point for a comprehensive discussion on how future standards for CMC corpora could (and should) be shaped like.
Converting and Representing Social Media Corpora into TEI: Schema and best practices from CLARIN-D
(2016)
The paper presents results from a curation project within CLARIN-D, in which an existing lMWord corpus of German chat communication has been integrated into the DEREKO and DWDS corpus infrastructures of the CLARIN-D centres at the Institute for the German Language (IDS, Mannheim) and at the Berlin-Brandenburg Academy of Sciences (BBAW, Berlin). The focus is on the solutions developed for converting and representing the corpus in a TEI format.
The paper reports the results of the curation project ChatCorpus2CLARIN. The goal of the project was to develop a workflow and resources for the integration of an existing chat corpus into the CLARIN-D research infrastructure for language resources and tools in the Humanities and the Social Sciences (http://clarin-d.de). The paper presents an overview of the resources and practices developed in the project, describes the added value of the resource after its integration and discusses, as an outlook, to what extent these practices can be considered best practices which may be useful for the annotation and representation of other CMC and social media corpora.
The paper presents an XML schema for the representation of genres of computer-mediated communication (CMC) that is compliant with the encoding framework defined by the TEI. It was designed for the annotation of CMC documents in the project Deutsches Referenzkorpus zur internetbasierten Kommunikation (DeRiK), which aims at building a corpus on language use in the most popular CMC genres on the German-speaking Internet. The focus of the schema is on those CMC genres which are written and dialogic―such as forums, bulletin boards, chats, instant messaging, wiki and weblog discussions, microblogging on Twitter, and conversation on “social network” sites.
The schema provides a representation format for the main structural features of CMC discourse as well as elements for the annotation of those units regarded as “typical” for language use on the Internet. The schema introduces an element <posting>, which describes stretches of text that are sent to the server by a user at a certain point in time. Postings are the main constituting elements of threads and logfiles, which, in our schema, are the two main types of CMC macrostructures. For the microlevel of CMC documents (that is, the structure of the <posting> content), the schema introduces elements for selected features of Internet jargon such as emoticons, interaction words and addressing terms. It allows for easy anonymization of CMC data for purposes in which the annotated data are made publicly available and includes metadata which are necessary for referencing random excerpts from the data as references in dictionary entries or as results of corpus queries.
Documentation of the schema as well as encoding examples can be retrieved from the web at http://www.empirikom.net/bin/view/Themen/CmcTEI. The schema is meant to be a core model for representing CMC that can be modified and extended by others according to their own specific perspectives on CMC data. It could be a first step towards an integration of features for the representation of CMC genres into a future new version of the TEI Guidelines.
Generierung von Linkangeboten zur Rekonstruktion terminologiebedingter Wissensvoraussetzungen
(2002)
Dieser Beitrag skizziert Strategien zur (semi-)automatischen Annotation von definitorischen Textsegmenten und Termverwendungsinstanzen auf der Grundlage grammatisch annotierter Korpora. Ziel unserer Überlegungen ist es, bei der selektiven Rezeption von Fachtexten in einer Hypertextumgebung die je spezifischen Wissensvoraussetzungen, die der Verwendung von Fachtermini unterliegen und die für das Textverständnis eine entscheidende Rolle spielen, über automatisch generierte Linkangebote rekonstruierbar zu machen.
The paper reports on the results of a scientific colloquium dedicated to the creation of standards and best practices which are needed to facilitate the integration of language resources for CMC stemming from different origins and the linguistic analysis of CMC phenomena in different languages and genres. The key issue to be solved is that of interoperability – with respect to the structural representation of CMC genres, linguistic annotations metadata, and anonymization/pseudonymization schemas. The objective of the paper is to convince more projects to partake in a discussion about standards for CMC corpora and for the creation of a CMC corpus infrastructure across languages and genres. In view of the broad range of corpus projects which are currently underway all over Europe, there is a great window of opportunity for the creation of standards in a bottom-up approach.
This paper describes the lexical database tool LOLA (Linguistic-Oriented Lexical database Approach) which has been developed for the construction and maintenance of lexicons for the machine translation system LMT. First, the requirements such a tool should meet are discussed, then LMT and the lexical information it requires, and some issues concerning vocabulary acquisition are presented. Afterwards the architecture and the components of the LOLA system are described and it is shown how we tried to meet the requirements worked out earlier. Although LOLA originally has been designed and implemented for the German-English LMT prototype, it aimed from the beginning at a representation of lexical data that can be reused for other LMT or MT prototypes or even other NLP applications. A special point of discussion will therefore be the adaptability of the tool and its components as well as the reusability of the lexical data stored in the database for the lexicon development for LMT or for other applications.
Internetwörterbücher können viele Informationstypen auf neuartige Weise vereinigen und nutzeradaptiv präsentieren. Sie bilden in vernetzter Form als „Megawörterbücher“ große Wörterbuchportale und verschmelzen mit Korpora, multimedialen Erweiterungen und automatischen Sprachanalysetools zu Wortschatzinformationssystemen neuer Art. Es ist daher schwierig geworden, zwischen einen Wörterbuch einem Korpus, einem Atlas und einer Frequenzliste zu unterscheiden. Die Autoren versuchen, ein wenig Licht in das Dunkel der verschiedenen Typen von Wörterbüchern, Wörterbuchportalen und Wortschatzinformationssystemen zu bringen, und dabei auch zeigen, dass sich die Unordnung, die eine „Schlöraffe“ in die Klassifikation des Tierreichs bringt, am Ende durchaus auszahlen kann.
Einführung
(1998)