Volltext-Downloads (blau) und Frontdoor-Views (grau)
The search result changed since you submitted your search request. Documents might be displayed in a different sort order.
  • search hit 9 of 135
Back to Result List

Querying Interaction Structure: Approaches to Overlap in Spoken Language Corpora

  • In this paper, we address two problems in indexing and querying spoken language corpora with overlapping speaker contributions. First, we look into how token distance and token precedence can be measured when multiple primary data streams are available and when transcriptions happen to be tokenized, but are not synchronized with the sound at the level of individual tokens. We propose and experiment with a speaker based search mode that enables any speaker’s transcription tier to be the basic tokenization layer whereby the contributions of other speakers are mapped to this given tier. Secondly, we address two distinct methods of how speaker overlaps can be captured in the TEI based ISO Standard for Spoken Language Transcriptions (ISO 24624:2016) and how they can be queried by MTAS – an open source Lucene-based search engine for querying text with multilevel annotations. We illustrate the problems, introduce possible solutions and discuss their benefits and drawbacks.

Export metadata

Additional Services

Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Elena FrickGND, Henrike HelmerORCiDGND, Thomas SchmidtORCiDGND
URN:urn:nbn:de:bsz:mh39-111054
URL:http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.75.pdf
Parent Title (English):Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022). Marseille, 20-25 June 2022
Publisher:European Language Resources Association (ELRA)
Place of publication:Paris
Document Type:Part of a Book
Language:English
Year of first Publication:2022
Date of Publication (online):2022/06/29
Publishing Institution:Leibniz-Institut für Deutsche Sprache (IDS)
Publicationstate:Veröffentlichungsversion
Reviewstate:Peer-Review
Tag:MTAS; corpus search engine; multi-turn conversations; oral corpora; query language; spoken language corpora; spoken language data
GND Keyword:Abfragesprache; Deutsch; Gesprochene Sprache; Korpus <Linguistik>; Sprecherwechsel; Suchmaschine; Token <Linguistik>
First Page:715
Last Page:722
DDC classes:400 Sprache / 400 Sprache, Linguistik / 400 Sprache
Open Access?:ja
Leibniz-Classification:Sprache, Linguistik
Linguistics-Classification:Gesprächsforschung / Gesprochene Sprache
Linguistics-Classification:Korpuslinguistik
Program areas:P1: Interaktion
Licence (German):License LogoCreative Commons - CC BY-NC - Namensnennung - Nicht kommerziell 4.0 International