Refine
Year of publication
Document Type
- Part of a Book (41)
- Article (18)
- Conference Proceeding (12)
- Other (1)
Has Fulltext
- yes (72)
Keywords
- Digitalisierung (12)
- Deutsch (10)
- Computerlinguistik (9)
- Texttechnologie (8)
- Multimodalität (7)
- Kulturwandel (6)
- Wissenschaftskommunikation (6)
- XML (6)
- Korpus <Linguistik> (5)
- Sprache (5)
Publicationstate
- Zweitveröffentlichung (72) (remove)
Reviewstate
- (Verlags)-Lektorat (53)
- Peer-Review (10)
Publisher
- de Gruyter (12)
- Aisthesis Verlag (4)
- Gesellschaft für Informatik e.V. (3)
- Campus (2)
- Narr (2)
- Stauffenburg Verlag (2)
- VS Verlag für Sozialwissenschaften (2)
- Westdeutscher Verlag (2)
- ACM (1)
- Berlin-Brandenburgische Akademie der Wissenschaften (1)
Uncertain about Uncertainty: Different ways of processing fuzziness in digital humanities data
(2014)
The GeoBib project is constructing a georeferenced online bibliography of early Holocaust and camp literature published between 1933 and 1949 (Entrup et al. 2013a). Our immediate objectives include identifying the texts of interest in the first place, composing abstracts for them, researching their history, and annotating relevant places and times. Relations between persons, texts, and places will be visualized using digital maps and GIS software as an integral part of the resulting GeoBib information portal. The combination of diverse data from varying sources not only enriches our knowledge of these otherwise mostly forgotten texts; it also confronts us with vague, uncertain or even conflicting information. This situation yields challenges for all researchers involved – historians, literary scholars, geographers and computer scientists alike. While the project operates at the intersection of historical and literary studies, the involved computer scientists are in charge of providing a working environment (Entrup et al. 2013b) and processing the collected information in a way that is formalized yet capable of dealing with inevitable vagueness, uncertainty and contradictions. In this paper we focus on the problems and opportunities of encoding and processing fuzzy data.
The present paper reports the first results of the compilation and annotation of a blog corpus for German. The main aim of the project is the representation of the blog discourse structure and relations between its elements (blog posts, comments) and participants (bloggers, commentators). The data included in the corpus were manually collected from the scientific blog portal SciLogs. The feature catalogue for the corpus annotation includes three types of information which is directly or indirectly provided in the blog or can be construed by means of statistical analysis or computational tools. At this point, only directly available information (e.g., title of the blog post, name of the blogger etc.) has been annotated. We believe, our blog corpus can be of interest for the general study of blog structure or related research questions as well as for the development of NLP methods and techniques (e.g. for authorship detection).