410 Linguistik
Refine
Year of publication
Document Type
- Part of a Book (423)
- Article (167)
- Conference Proceeding (127)
- Book (33)
- Working Paper (14)
- Other (8)
- Doctoral Thesis (2)
- Habilitation (2)
- Master's Thesis (1)
- Preprint (1)
Keywords
- Deutsch (262)
- Korpus <Linguistik> (116)
- Konversationsanalyse (76)
- Kommunikation (47)
- Computerlinguistik (44)
- Gesprochene Sprache (39)
- Computerunterstützte Lexikographie (36)
- Annotation (35)
- Automatische Sprachanalyse (35)
- Interaktion (27)
Publicationstate
- Veröffentlichungsversion (605)
- Postprint (42)
- Zweitveröffentlichung (6)
- Preprint (3)
- (Verlags)-Lektorat (1)
Reviewstate
- (Verlags)-Lektorat (514)
- Peer-Review (82)
- Verlags-Lektorat (16)
- Peer-review (13)
- Qualifikationsarbeit (Dissertation, Habilitationsschrift) (4)
- Review-Status-unbekannt (4)
- Peer-Revied (3)
- (Verlags-)Lektorat (2)
- (Verlags-) Lektorat (1)
- Abschlussarbeit (Bachelor, Master, Diplom, Magister) (Bachelor, Master, Diss.) (1)
Publisher
- Narr (92)
- de Gruyter (87)
- Institut für Deutsche Sprache (29)
- Lang (29)
- Niemeyer (26)
- Schmidt (26)
- De Gruyter (23)
- European Language Resources Association (ELRA) (20)
- Verlag für Gesprächsforschung (15)
- Benjamins (14)
A key difference between traditional humanities research and the emerging field of digital humanities is that the latter aims to complement qualitative methods with quantitative data. In linguistics, this means the use of large corpora of text, which are usually annotated automatically using natural language processing tools. However, these tools do not exist for historical texts, so scholars have to work with unannotated data. We have developed a system for systematic iterative exploration and annotation of historical text corpora, which relies on an XML database (BaseX) and in particular on the Full Text and Update facilities of XQuery.
W artykule przedstawiono analizç struktury metaforycznej polskich dyskursów na temat konca komunizmu panst wowego. Analizç przeprowadzono w oparciu o bazç danych, zawierajqcq 1008 metafor pochodzqcych z tekstów prasowych z 1999 roku, upamiçtniajqcych wazne wydarzenia z 1989 roku. Jak siç okazuje, struktury metaforyczne róznych dyskursów wyrazajq i utrwalajq ideologjcznie uksztaltowane interpretacje historii. Szczegolowiej badano interpretacje metaforyczne dwóch zjawisk: zachowania siç przedstawicieli wladzy i opozycji przy Okrqglym Stole oraz pytania o ciqglosc historii. Te dwa zjawiska — których konceptualizacja gra waznq rolç w okreáleniu autostereotypu Polaka w III RP — sq interpretowane za pomocq róznego rodzaju metafor. Metaforyczne rozumienie ciqglosci historii da siç analizowac za pomocq tak zwanej „konceptualnej teorii metafory" LakofFa i Johnsona. Natomiast zachowania komunistów i opozycjonistów sq. interpretowane za pomoc^ metafor intertekstualnych. Sq one skonstruowane nie na podstawie doswiadczenia cielesnego, lecz doswiadczenia specyficznego dia danej kultury. Wydaje siç zatem, ze ksztaltowanie róznego rodzaju pojçc w dyskursie aktywizuje rózne strefy bazy doswiadczeniowej.
Our paper outlines a proposal for the consistent modeling of heterogeneous lexical structures in semasiological dictionaries, based on the element structures described in detail in chapter 9 (Dictionaries) of the TEI Guidelines. The core of our proposal describes a system of relatively autonomous lexical “crystals” that can, within the constraints of the relevant element’s definition, be combined to form complex structures for the description of morphological form, grammatical information, etymology, word-formation, and meaning for a lexical structure.
The encoding structures we suggest guarantee sustainability and support re-usability and interoperability of data. This paper presents case studies of encoding dictionary entries in order to illustrate our concepts and test their usability.
We comment on encoding issues involving <entry>, <form>, <etym>, and on refinements to the internal content of <sense>.
Although most of the relevant dictionary productions of the recent past have relied on digital data and methods, there is little consensus on formats and standards. The Institute for Corpus Linguistics and Text Technology (ICLTT) of the Austrian Academy of Sciences has been conducting a number of varied lexicographic projects, both digitising print dictionaries and working on the creation of genuinely digital lexicographic data. This data was designed to serve varying purposes: machine-readability was only one. A second goal was interoperability with digital NLP tools. To achieve this end, a uniform encoding system applicable across all the projects was developed. The paper describes the constraints imposed on the content models of the various elements of the TEI dictionary module and provides arguments in favour of TEI P5 as an encoding system not only being used to represent digitised print dictionaries but also for NLP purposes.
The paper presents an XML schema for the representation of genres of computer-mediated communication (CMC) that is compliant with the encoding framework defined by the TEI. It was designed for the annotation of CMC documents in the project Deutsches Referenzkorpus zur internetbasierten Kommunikation (DeRiK), which aims at building a corpus on language use in the most popular CMC genres on the German-speaking Internet. The focus of the schema is on those CMC genres which are written and dialogic―such as forums, bulletin boards, chats, instant messaging, wiki and weblog discussions, microblogging on Twitter, and conversation on “social network” sites.
The schema provides a representation format for the main structural features of CMC discourse as well as elements for the annotation of those units regarded as “typical” for language use on the Internet. The schema introduces an element <posting>, which describes stretches of text that are sent to the server by a user at a certain point in time. Postings are the main constituting elements of threads and logfiles, which, in our schema, are the two main types of CMC macrostructures. For the microlevel of CMC documents (that is, the structure of the <posting> content), the schema introduces elements for selected features of Internet jargon such as emoticons, interaction words and addressing terms. It allows for easy anonymization of CMC data for purposes in which the annotated data are made publicly available and includes metadata which are necessary for referencing random excerpts from the data as references in dictionary entries or as results of corpus queries.
Documentation of the schema as well as encoding examples can be retrieved from the web at http://www.empirikom.net/bin/view/Themen/CmcTEI. The schema is meant to be a core model for representing CMC that can be modified and extended by others according to their own specific perspectives on CMC data. It could be a first step towards an integration of features for the representation of CMC genres into a future new version of the TEI Guidelines.