Volltext-Downloads (blau) und Frontdoor-Views (grau)
  • search hit 1 of 72
Back to Result List

Using Wiktionary revision history to uncover lexical innovations related to topical events: application to Covid-19 neologisms

  • In the present contribution, I investigate if and how the English and French editions of the Wiktionary collaborative dictionary can be used as a corpus for real time neology watch. This option is envisaged as a stopgap, when no satisfactory corpus is available. Wiktionary can also prove useful in addition to standard corpus analysis, to minimize the risk of overlooking new coinages and new senses. Since the collaborative dictionary’s quest for exhaustiveness makes the manual inspection of the new additions unreasonable (more than 31,000 English lemmas and 11,000 French lemmas entered the nomenclature in 2020), identifying the possibly relevant headwords is an issue. The solution proposed here is to use Wiktionary revision history to detect the (new or existing) entries that received the greatest number of modifications. The underlying hypothesis is that the most heavily edited pages can help identify the vocabulary related to “hot topics”, assuming that, in 2020, the pandemic-related vocabulary ranks high. I used two measures introduced by Lih (2004), whose aim was to estimate the quality of Wikipedia articles: the so-called rigour (number of edits per page) and diversity (number of unique contributors per page). In the present study, I propose to adapt the rigour and diversity metrics to Wiktionary in order to identify the pages that generated a particular stir, rather than to estimate the quality of the articles. I do not subscribe to the idea that – in Wiktionary – more revisions necessarily produce quality articles (more revisions often produce complete articles). I therefore adopt Lih’s notion of diversity to refer to the number of distinct contributors, but leave out the name rigour when it comes to the number of revisions. Wolfer and Müller-Spitzer (2016) used the two metrics to describe the dynamics of the German and English editions of Wiktionary. One of their findings was that the number of edits per page is correlated with corpus word frequencies. The variation in number of page edits should therefore reflect to some extent the variation of corpus word frequencies. Renouf (2013) established a relationship between the fluctuation of word frequencies in a diachronic corpus and various neological processes. In particular, she illustrated how specific events generate sudden frequency spikes for words previously unseen in the corpus. For instance, Eyjafjallajökull, the – existing – name of an Icelandic glacier, appeared in the corpus when the underlying volcano erupted in 2010 and disrupted air traffic in Europe. In order to check if the same phenomenon occurs when using Wiktionary edits instead of corpus frequencies, I manually annotated the most frequently revised entries (according to various ranking scores) with the binary tag: “related to Covid-19” (yes/no). The annotations were then used to test the ability of various configurations to detect relevant headwords from the English and French Wiktionary, namely Covid-19 neologisms and related existing words that deserve updates.

Export metadata

Additional Services

Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Franck SajousORCiD
URN:urn:nbn:de:bsz:mh39-114438
DOI:https://doi.org/10.1515/9783110798081-014
ISBN:978-3-11-079808-1
ISSN:0175-9264
Parent Title (English):Lexicography of coronavirus-related neologisms
Series (Serial Number):Lexicographica : series maior (163)
Publisher:de Gruyter
Place of publication:Berlin/Boston
Editor:Annette Klosa-Kückelhaus, Ilan Kernerman
Document Type:Part of a Book
Language:English
Year of first Publication:2022
Date of Publication (online):2023/01/09
Publishing Institution:Leibniz-Institut für Deutsche Sprache (IDS)
Publicationstate:Veröffentlichungsversion
Reviewstate:(Verlags)-Lektorat
Tag:Wiktionary; Wiktionary revision history; collaborative dictionary; headword; lexical innovation; topical event
GND Keyword:COVID-19; Englisch; Französisch; Lexikografie; Neologismus; Online-Wörterbuch; Pandemie; Social Media; Wörterbuch
First Page:275
Last Page:306
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Linguistics-Classification:Lexikografie
Licence (English):License LogoCreative Commons - Attribution 4.0 International