Extracting specialized terminology from linguistic corpora

In this paper, we present our approach to automatically extracting German terminology in the domain of grammar using texts from the online information system grammis as our corpus. We analyze existing repositories of German grammatical terminology and develop Part-of-speech patterns for our extraction thereby showing the importance of unigrams in this domain. We contrast the results of the automatic extraction with a manually extracted standard. By comparing the performance of well-known statistical measures, we show how measures based on corpus comparison outperform alternative methods.

Metadaten
Author:	Christian Lang GND, Roman Schneider GND, Karolina Suchowolec GND
URN:	urn:nbn:de:bsz:mh39-74760
DOI:	https://doi.org/10.17885/heiup.361.509
ISBN:	978-3-946054-82-5
Parent Title (English):	Grammar and corpora 2016
Publisher:	Heidelberg University Publishing
Place of publication:	Heidelberg
Editor:	Eric Fuß, Marek Konopka, Beata Trawiński, Ulrich Hermann Waßner
Document Type:	Part of a Book
Language:	English
Year of first Publication:	2018
Date of Publication (online):	2018/05/23
Publicationstate:	Veröffentlichungsversion
Reviewstate:	Peer-Review
Tag:	automatic term extraction; grammatical information system; grammatical terminology; terminological structurer
GND Keyword:	Automatische Sprachverarbeitung; Deutsch; Grammatik; Grammis; Terminologie
First Page:	425
Last Page:	434
DDC classes:	400 Sprache / 400 Sprache, Linguistik
Open Access?:	ja
Leibniz-Classification:	Sprache, Linguistik
Linguistics-Classification:	Computerlinguistik
Program areas:	Grammatik
Program areas:	Digitale Sprachwissenschaft
Licence (German):	Creative Commons - Namensnennung - Weitergabe unter gleichen Bedingungen 4.0 International

Open Access