Towards a multilingual dictionary of discourse markers. Automatic extraction of units from parallel corpus
- This paper presents a multilingual dictionary project of discourse markers. During its first stage, consisting of collecting the list of headwords, we used a parallel corpus to automatically extract units from texts written in Spanish, Catalan, English, French and German. We also applied a method to create a taxonomy structure for automatically organising the markers in clusters. As a result, we obtain an extensive, corpus-driven list of headwords. We present a prototype of the microstructure of the dictionary in the form of a standard XML database and describe the procedure to automatically fill in most of its fields (e.g., the type of DM, the equivalents in other languages, etc.), before human intervention.
Author: | Irene Renau, Rogelio Nazar |
---|---|
URN: | urn:nbn:de:bsz:mh39-111830 |
URL: | https://euralex2022.ids-mannheim.de/wp-content/uploads/2022/07/Proceedings_11.07.2022.pdf |
DOI: | https://doi.org/10.14618/ids-pub-11183 |
ISBN: | 978-3-937241-87-6 |
Parent Title (English): | Dictionaries and Society. Proceedings of the XX EURALEX International Congress, 12-16 July 2022, Mannheim, Germany |
Publisher: | IDS-Verlag |
Place of publication: | Mannheim |
Editor: | Annette Klosa-Kückelhaus, Stefan Engelberg, Christine Möhrs, Petra Storjohann |
Document Type: | Part of a Book |
Language: | English |
Year of first Publication: | 2022 |
Date of Publication (online): | 2022/08/18 |
Publishing Institution: | Leibniz-Institut für Deutsche Sprache (IDS) |
Publicationstate: | Veröffentlichungsversion |
Reviewstate: | Peer-Review |
Tag: | Computational lexicography; corpus-driven lexicography; discourse markers; multilingual lexicography |
GND Keyword: | Diskursmarker; Elektronisches Wörterbuch; Korpus <Linguistik>; Lexikographie; Mehrsprachiges Wörterbuch |
First Page: | 262 |
Last Page: | 272 |
DDC classes: | 400 Sprache / 420 Englisch |
Open Access?: | ja |
Linguistics-Classification: | Lexikografie |
Conferences, Workshops: | Dictionaries and Society. Proceedings of the XX EURALEX International Congress, 12-16 July 2022, Mannheim, Germany |
Licence (German): | Creative Commons - CC BY-SA - Namensnennung - Weitergabe unter gleichen Bedingungen 4.0 International |