410 Linguistik
Refine
Document Type
- Conference Proceeding (2)
- Article (1)
- Other (1)
Has Fulltext
- yes (4)
Keywords
- Korpus <Linguistik> (3)
- Deutsch (2)
- Annotation (1)
- Automatische Spracherkennung (1)
- Automatische Sprachverarbeitung (1)
- Computational linguistics (1)
- Computerlinguistik (1)
- Corpora (Linguistics) (1)
- Emotion (1)
- Englisch (1)
Publicationstate
- Veröffentlichungsversion (3)
- Postprint (1)
Reviewstate
- Review-Status-unbekannt (4) (remove)
Tagset und Richtlinie für das PoSTagging von Sprachdaten aus Genres internetbasierter Kommunikation
(2015)
Centering on German self-motion verbs, this paper demonstrates the advantages of free-sorting over creating and delineating word fields with more traditional methods. In particular, I draw a comparison to Snell-Hornby’s (1983) work on German descriptive verbs, which produces lexical fields with the help of dictionary entries, a thesaurus, a small corpus of written text and limited speaker feedback. While these methods have benefits, they are limited in their ability to represent the average organization of semantic fields in the mind of everyday speakers. Freesorting, by contrast, does not rely on academic resources, corpora or singular speaker judgments. In sorting, a group of informants creates visible sets of items according to perceived similarity. Psycholinguists have used the method to quantitatively explore the perception of color terms across cultures (c.f. Roberson et al. 2005). With a sufficiently large number of informants, one can generate lexical sorting data that is apt for cluster analysis, the results of which are represented by dendrograms. The experiment I conducted involved 33 school children from a middle class neighborhood in Braunschweig, Northern Germany. My experiment shows that Snell-Hornby’s (1983) representation of the self-motion field can be improved by integrating further dimensions of meaning, such as body-space relations and sound, that young speakers find salient in the grouping procedure.
The task-oriented and format-driven development of corpus query systems has led to the creation of numerous corpus query languages (QLs) that vary strongly in expressiveness and syntax. This is a severe impediment for the interoperability of corpus analysis systems, which lack a common protocol. In this paper, we present KoralQuery, a JSON-LD based general corpus query protocol, aiming to be independent of particular QLs, tasks and corpus formats. In addition to describing the system of types and operations that Koral- Query is built on, we exemplify the representation of corpus queries in the serialized format and illustrate use cases in the KorAP project.