Volltext-Downloads (blau) und Frontdoor-Views (grau)

Data Mining with Shallow vs. Linguistic Features to Study Diversification of Scientific Registers

  • We present a methodology to analyze the linguistic evolution of scientific registers with data mining techniques, comparing the insights gained from shallow vs. linguistic features. The focus is on selected scientific disciplines at the boundaries to computer science (computational linguistics, bioinformatics, digital construction, microelectronics). The data basis is the English Scientific Text Corpus (SCITEX) which covers a time range of roughly thirty years (1970/80s to early 2000s) (Degaetano-Ortlieb et al., 2013; Teich and Fankhauser, 2010). In particular, we investigate the diversification of scientific registers over time. Our theoretical basis is Systemic Functional Linguistics (SFL) and its specific incarnation of register theory (Halliday and Hasan, 1985). In terms of methods, we combine corpus-based methods of feature extraction and data mining techniques.

Download full text files

Export metadata

Additional Services

Share in Twitter Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Stefania Degaetano-Ortlieb, Peter Fankhauser, Hannah Kermes, Ekaterina Lapshinova-Koltunski, Noam Ordan, Elke Teich
URN:urn:nbn:de:bsz:mh39-26178
Parent Title (English):Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC '14)
Publisher:European Language Resources Association (ELRA)
Place of publication:Reykjavik
Document Type:Conference Proceeding
Language:English
Year of first Publication:2014
Date of Publication (online):2014/06/13
Tag:Textklassifizierung
Data Mining; Register; Text Classification
GND Keyword:Korpus <Linguistik>
First Page:1327
Last Page:1334
Dewey Decimal Classification:400 Sprache / 410 Linguistik / 410 Linguistik
Leibniz-Classification:Sprache, Linguistik
Linguistics-Classification:Korpuslinguistik
Open Access?:Ja
Licence (German):Es gilt das UrhG