Volltext-Downloads (blau) und Frontdoor-Views (grau)

To BERT or not to BERT – Comparing contextual embeddings in a deep learning architecture for the automatic recognition of four types of speech, thought and writing representation

  • We present recognizers for four very different types of speech, thought and writing representation (STWR) for German texts. The implementation is based on deep learning with two different customized contextual embeddings, namely FLAIR embeddings and BERT embeddings. This paper gives an evaluation of our recognizers with a particular focus on the differences in performance we observed between those two embeddings. FLAIR performed best for direct STWR (F1=0.85), BERT for indirect (F1=0.76) and free indirect (F1=0.59) STWR. For reported STWR, the comparison was inconclusive, but BERT gave the best average results and best individual model (F1=0.60). Our best recognizers, our customized language embeddings and most of our test and training data are freely available and can be found via www.redewiedergabe.de or at github.com/redewiedergabe.

Export metadata

Additional Services

Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Annelen BrunnerORCiDGND, Ngoc Duyen Tanja TuORCiDGND, Lukas WeimerORCiDGND, Fotis JannidisORCiDGND
URN:urn:nbn:de:bsz:mh39-115617
URL:https://ceur-ws.org/Vol-2624/paper5.pdf
ISSN:1613-0073
Parent Title (English):Proceedings of the 5th Swiss Text Analytics Conference (SwissText) & 16th Conference on Natural Language Processing (KONVENS)
Publisher:CEUR-WS
Place of publication:Aachen
Editor:Sarah Ebling, Don Tuggener, Manuela Hürlimann, Mark Cieliebak, Martin Volk
Document Type:Conference Proceeding
Language:English
Year of first Publication:2020
Date of Publication (online):2023/03/23
Publishing Institution:Leibniz-Institut für Deutsche Sprache (IDS)
Publicationstate:Veröffentlichungsversion
Reviewstate:Peer-Review
GND Keyword:Deutsch; Einbettung <Linguistik>; Testdaten; Textanalyse
Page Number:11
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Leibniz-Classification:Sprache, Linguistik
Linguistics-Classification:Korpuslinguistik
Program areas:G2: Sprachinformationssysteme
Program areas:L2: Lexikalische Syntagmatik
Licence (English):License LogoCreative Commons - Attribution 4.0 International