Volltext-Downloads (blau) und Frontdoor-Views (grau)

The linguistic construal of disciplinarity: A data-mining approach using register features

  • We analyze the linguistic evolution of selected scientific disciplines over a 30-year time span (1970s to 2000s). Our focus is on four highly specialized disciplines at the boundaries of computer science that emerged during that time: computational linguistics, bioinformatics, digital construction, and microelectronics. Our analysis is driven by the question whether these disciplines develop a distinctive language use—both individually and collectively—over the given time period. The data set is the English Scientific Text Corpus (scitex), which includes texts from the 1970s/1980s and early 2000s. Our theoretical basis is register theory. In terms of methods, we combine corpus-based methods of feature extraction (various aggregated features [part-of-speech based], n-grams, lexico-grammatical patterns) and automatic text classification. The results of our research are directly relevant to the study of linguistic variation and languages for specific purposes (LSP) and have implications for various natural language processing (NLP) tasks, for example, authorship attribution, text mining, or training NLP tools.

Download full text files

  • Teich_Degaetano-Ortlieb_Fankhauser_Kermes_Lapshinova-Koltunski_The_linguistic_construal_of_disciplinarity_2015.pdf
    eng

    (IDS-intern)

Export metadata

Additional Services

Share in Twitter Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Elke Teich, Stefania Degaetano-Ortlieb, Peter Fankhauser, Hannah Kermes, Ekaterina Lapshinova-Koltunski
URN:urn:nbn:de:bsz:mh39-44369
DOI:https://doi.org/10.1002/asi.23457
ISSN:1097-4571
Parent Title (English):Journal of the Association for Information Science and Technology
Document Type:Article
Language:English
Year of first Publication:2015
Date of Publication (online):2015/05/05
Publicationstate:Postprint
Reviewstate:Peer-Revied
Tag:automatic classification; data mining
Edition:Early View (Online Version of Record published before inclusion in an issue)
First Page:1
Last Page:11
Dewey Decimal Classification:400 Sprache / 410 Linguistik
Open Access?:Nein
Licence (German):Es gilt das UrhG