Extracting specialized terminology from linguistic corpora
- In this paper, we present our approach to automatically extracting German terminology in the domain of grammar using texts from the online information system grammis as our corpus. We analyze existing repositories of German grammatical terminology and develop Part-of-speech patterns for our extraction thereby showing the importance of unigrams in this domain. We contrast the results of the automatic extraction with a manually extracted standard. By comparing the performance of well-known statistical measures, we show how measures based on corpus comparison outperform alternative methods.
Author: | Christian LangGND, Roman SchneiderORCiDGND, Karolina SuchowolecGND |
---|---|
URN: | urn:nbn:de:bsz:mh39-74760 |
DOI: | https://doi.org/10.17885/heiup.361.509 |
ISBN: | 978-3-946054-82-5 |
Parent Title (English): | Grammar and corpora 2016 |
Publisher: | Heidelberg University Publishing |
Place of publication: | Heidelberg |
Editor: | Eric Fuß, Marek Konopka, Beata Trawiński, Ulrich Hermann Waßner |
Document Type: | Part of a Book |
Language: | English |
Year of first Publication: | 2018 |
Date of Publication (online): | 2018/05/23 |
Publicationstate: | Veröffentlichungsversion |
Reviewstate: | Peer-Review |
Tag: | automatic term extraction; grammatical information system; grammatical terminology; terminological structurer |
GND Keyword: | Automatische Sprachverarbeitung; Deutsch; Grammatik; Grammis; Terminologie |
First Page: | 425 |
Last Page: | 434 |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Open Access?: | ja |
Leibniz-Classification: | Sprache, Linguistik |
Linguistics-Classification: | Computerlinguistik |
Program areas: | Grammatik |
Program areas: | Digitale Sprachwissenschaft |
Licence (German): | ![]() |