Team seminar
Carlos-Emiliano González-Gallardo, associate professor at LIFAT, will give a talk entitled “Yes but.. Can Large Language Models Identify Entities in Historical Documents?”
Here is the abstract: the efficacy of large language models (LLMs) has greatly impacted the field of natural language processing, achieving state-of-the-art performance across various tasks, including named entity recognition (NER) for contemporary texts. However, the use of LLMs for NER in historical collections, such as newspapers and classical commentaries, remains underexplored. This gap presents significant challenges for Digital Humanities research, as historical texts often suffer from noise due to suboptimal storage conditions, errors in optical character recognition, and variations in spelling. During this talk, I will share findings and insights from an empirical evaluation that compares different Instruct variants of both closed and open models. This study aims to improve the understanding and application of NER in historical collections and its relevance in digital libraries. To achieve this, we employed prompt engineering through both deductive (guidelines provided) and inductive (guidelines absent) methodologies, using publicly available historical collections in English, French, and German, along with code-switching in Ancient Greek.