GPT for medical entity recognition in Spanish

April 29, 2024

Publications

>

Article

Sí

Publicated to:Multimedia Tools And Applications. - 2024-01-01 (), DOI: 10.1007/s11042-024-19209-5

Authors: García-Barragán Á; González Calatayud A; Solarte-Pabón O; Provencio M; Menasalvas E; Robles V

Affiliations

Hospital Universitario Puerta de Hierro Majadahonda - Author

Universidad del Valle, Cali - Author

Universidad Politécnica de Madrid - Author

Abstract

In recent years, there has been a remarkable surge in the development of Natural Language Processing (NLP) models, particularly in the realm of Named Entity Recognition (NER). Models such as BERT have demonstrated exceptional performance, leveraging annotated corpora for accurate entity identification. However, the question arises: Can newer Large Language Models (LLMs) like GPT be utilized without the need for extensive annotation, thereby enabling direct entity extraction? In this study, we explore this issue, comparing the efficacy of fine-tuning techniques with prompting methods to elucidate the potential of GPT in the identification of medical entities within Spanish electronic health records (EHR). This study utilized a dataset of Spanish EHRs related to breast cancer and implemented both a traditional NER method using BERT, and a contemporary approach that combines few shot learning and integration of external knowledge, driven by LLMs using GPT, to structure the data. The analysis involved a comprehensive pipeline that included these methods. Key performance metrics, such as precision, recall, and F-score, were used to evaluate the effectiveness of each method. This comparative approach aimed to highlight the strengths and limitations of each method in the context of structuring Spanish EHRs efficiently and accurately.The comparative analysis undertaken in this article demonstrates that both the traditional BERT-based NER method and the few-shot LLM-driven approach, augmented with external knowledge, provide comparable levels of precision in metrics such as precision, recall, and F score when applied to Spanish EHR. Contrary to expectations, the LLM-driven approach, which necessitates minimal data annotation, performs on par with BERT’s capability to discern complex medical terminologies and contextual nuances within the EHRs. The results of this study highlight a notable advance in the field of NER for Spanish EHRs, with the few shot approach driven by LLM, enhanced by external knowledge, slightly edging out the traditional BERT-based method in overall effectiveness. GPT’s superiority in F-score and its minimal reliance on extensive data annotation underscore its potential in medical data processing.

Keywords

BertBreast cancerEhrGptInformation extractionLlmNer

Quality index

Bibliometric impact. Analysis of the contribution and dissemination channel

The work has been published in the journal Multimedia Tools And Applications due to its progression and the good impact it has achieved in recent years, according to the agency Scopus (SJR), it has become a reference in its field. In the year of publication of the work, 2024 there are still no calculated indicators, but in 2023, it was in position , thus managing to position itself as a Q1 (Primer Cuartil), in the category Media Technology.

Independientemente del impacto esperado determinado por el canal de difusión, es importante destacar el impacto real observado de la propia aportación.

Según las diferentes agencias de indexación, el número de citas acumuladas por esta publicación hasta la fecha 2025-08-22:

Google Scholar: 8
Scopus: 12

Impact and social visibility

Leadership analysis of institutional authors

This work has been carried out with international collaboration, specifically with researchers from: Colombia.

There is a significant leadership presence as some of the institution’s authors appear as the first or last signer, detailed as follows: First Author (GARCIA BARRAGAN, ALVARO) and Last Author (ROBLES FORCADA, VICTOR).

Indexed in

License and use

Citations

Altmetrics

Analysis of institutional authors

Share