TY - CHAP U1 - Konferenzveröffentlichung A1 - Degaetano-Ortlieb, Stefania A1 - Fankhauser, Peter A1 - Kermes, Hannah A1 - Lapshinova-Koltunski, Ekaterina A1 - Ordan, Noam A1 - Teich, Elke T1 - Data Mining with Shallow vs. Linguistic Features to Study Diversification of Scientific Registers T2 - Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC '14) N2 - We present a methodology to analyze the linguistic evolution of scientific registers with data mining techniques, comparing the insights gained from shallow vs. linguistic features. The focus is on selected scientific disciplines at the boundaries to computer science (computational linguistics, bioinformatics, digital construction, microelectronics). The data basis is the English Scientific Text Corpus (SCITEX) which covers a time range of roughly thirty years (1970/80s to early 2000s) (Degaetano-Ortlieb et al., 2013; Teich and Fankhauser, 2010). In particular, we investigate the diversification of scientific registers over time. Our theoretical basis is Systemic Functional Linguistics (SFL) and its specific incarnation of register theory (Halliday and Hasan, 1985). In terms of methods, we combine corpus-based methods of feature extraction and data mining techniques. KW - Data Mining KW - Text Classification KW - Register KW - Textklassifizierung KW - Korpus Y1 - 2014 U6 - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-26178 UN - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-26178 SP - 1327 EP - 1334 PB - European Language Resources Association (ELRA) CY - Reykjavik ER -