Named entity recognition and entity linking
- Named Entity Recognition (NER) and Entity Linking (EL) are crucial tasks in Natural Language Processing (NLP) that enable the extraction and disambiguation of named entities in text data. NER involves identifying and labelling named entities, such as persons, locations, and organizations, while EL links these entities to external knowledge bases, providing additional context and information. NER and EL systems employ a range of techniques, including statistical and state-of-the-art neural approaches like transformer-based architectures, to recognize, disambiguate, and link entities to knowledge bases. The application of NER and EL has numerous use cases for automatic text analysis and information retrieval, and is generally useful to engage with text data in the area of digital humanities. This chapter provides an overview of NER and EL, including their tasks, techniques, relevant data formats, and applications, as well as some guidance on how to apply these technologies to raw data sets.
| Author: | Pia SchwarzORCiD |
|---|---|
| URN: | urn:nbn:de:bsz:mh39-134787 |
| DOI: | https://doi.org/10.1515/9783112208212 |
| ISBN: | 978-3-11-220821-2 |
| ISSN: | 2751-1286 |
| Parent Title (English): | Harmonizing language data. Standards for linguistic resources |
| Series (Serial Number): | Digital Linguistics (4) |
| Publisher: | De Gruyter |
| Place of publication: | Berlin/Boston |
| Editor: | Piotr BańskiORCiDGND, Ulrich HeidORCiDGND, Laura HerzbergORCiDGND |
| Document Type: | Part of a Book |
| Language: | German |
| Year of first Publication: | 2025 |
| Date of Publication (online): | 2025/10/02 |
| Publishing Institution: | Leibniz-Institut für Deutsche Sprache (IDS) |
| Publicationstate: | Veröffentlichungsversion |
| Reviewstate: | (Verlags)-Lektorat |
| Tag: | Data formats; Entity linking; Knowledge base; Language models; Named entity recognition |
| GND Keyword: | Computerlinguistik; Digital Humanities; Sprachdaten |
| First Page: | 89 |
| Last Page: | 114 |
| DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
| Open Access?: | ja |
| Linguistics-Classification: | Computerlinguistik |
| Program areas: | Digitale Sprachwissenschaft |
| Licence (English): | Creative Commons - Attribution 4.0 International |


