TY - CPAPER U1 - Konferenzveröffentlichung A1 - Gorisch, Jan A1 - Schmidt, Thomas ED - Calzolari, Nicoletta ED - Kan, Min-Yen ED - Hoste, Veronique ED - Lenci, Alessandro ED - Sakti, Sakriani ED - Xue, Nianwen T1 - Evaluating Workflows for Creating Orthographic Transcripts for Oral Corpora by Transcribing from Scratch or Correcting ASR-Output T2 - Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) N2 - Research projects incorporating spoken data require either a selection of existing speech corpora, or they plan to record new data. In both cases, recordings need to be transcribed to make them accessible to analysis. Underestimating the effort of transcribing can be risky. Automatic Speech Recognition (ASR) holds the promise to considerably reduce transcription effort. However, few studies have so far attempted to evaluate this potential. The present paper compares efforts for manual transcription vs. correction of ASR-output. We took recordings from corpora of varying settings (interview, colloquial talk, dialectal, historic) and (i) compared two methods for creating orthographic transcripts: transcribing from scratch vs. correcting automatically created transcripts. And (ii) we evaluated the influence of the corpus characteristics on the correcting efficiency. Results suggest that for the selected data and transcription conventions, transcribing and correcting still take equally long with 7 times real-time on average. The more complex the primary data, the more time has to be spent on corrections. Despite the impressive latest developments in speech technology, to be a real help for conversation analysts or dialectologists, ASR systems seem to require even more improvement, or we need sufficient and appropriate data for training such systems. KW - Deutsch KW - Korpus KW - Gesprochene Sprache KW - oral corpora KW - automatic transcription KW - ASR-correction KW - corpus curation KW - spoken German KW - Automatische Spracherkennung Y1 - 2024 U6 - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-126955 UN - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-126955 UR - https://aclanthology.org/2024.lrec-main.0/ SP - 6564 EP - 6574 PB - ELRA Language Resource Association CY - Paris ER -