Detecting the boundaries of sentence-like units on spoken German
- Automatic division of spoken language transcripts into sentence-like units is a challenging problem, caused by disfluencies, ungrammatical structures and the lack of punctuation. We present experiments on dividing up German spoken dialogues where we investigate the impact of task setup and data representation, encoding of context information as well as different model architectures for this task.
Author: | Josef Ruppenhofer, Ines Rehbein |
---|---|
URN: | urn:nbn:de:bsz:mh39-93174 |
URL: | https://corpora.linguistik.uni-erlangen.de/data/konvens/proceedings/Preliminary_proceedings_of_the_15th_Conference_on_Natural_Language_Processing_KONVENS_2019.pdf |
URL: | https://corpora.linguistik.uni-erlangen.de/data/konvens/proceedings/papers/KONVENS2019_paper_32.pdf |
Parent Title (English): | Preliminary proceedings of the 15th Conference on Natural Language Processing (KONVENS 2019), October 9 – 11, 2019 at Friedrich-Alexander-Universität Erlangen-Nürnberg |
Publisher: | German Society for Computational Linguistics & Language Technology und Friedrich-Alexander-Universität Erlangen-Nürnberg |
Place of publication: | München [u.a.] |
Document Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2019 |
Date of Publication (online): | 2019/10/15 |
Publicationstate: | Veröffentlichungsversion |
Reviewstate: | Peer-Review |
GND Keyword: | Automatische Sprachanalyse; Deutsch; Gesprochene Sprache; Satz; Segmentierung |
First Page: | 130 |
Last Page: | 139 |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Open Access?: | ja |
Leibniz-Classification: | Sprache, Linguistik |
Linguistics-Classification: | Computerlinguistik |
Licence (German): | ![]() |