
Indexed in
License and use

GPT for medical entity recognition in Spanish
Publicated to:Multimedia Tools And Applications. - 2024-01-01 (), DOI: 10.1007/s11042-024-19209-5
Authors: García-Barragán Á; González Calatayud A; Solarte-Pabón O; Provencio M; Menasalvas E; Robles V
Affiliations
Abstract
In recent years, there has been a remarkable surge in the development of Natural Language Processing (NLP) models, particularly in the realm of Named Entity Recognition (NER). Models such as BERT have demonstrated exceptional performance, leveraging annotated corpora for accurate entity identification. However, the question arises: Can newer Large Language Models (LLMs) like GPT be utilized without the need for extensive annotation, thereby enabling direct entity extraction? In this study, we explore this issue, comparing the efficacy of fine-tuning techniques with prompting methods to elucidate the potential of GPT in the identification of medical entities within Spanish electronic health records (EHR). This study utilized a dataset of Spanish EHRs related to breast cancer and implemented both a traditional NER method using BERT, and a contemporary approach that combines few shot learning and integration of external knowledge, driven by LLMs using GPT, to structure the data. The analysis involved a comprehensive pipeline that included these methods. Key performance metrics, such as precision, recall, and F-score, were used to evaluate the effectiveness of each method. This comparative approach aimed to highlight the strengths and limitations of each method in the context of structuring Spanish EHRs efficiently and accurately.The comparative analysis undertaken in this article demonstrates that both the traditional BERT-based NER method and the few-shot LLM-driven approach, augmented with external knowledge, provide comparable levels of precision in metrics such as precision, recall, and F score when applied to Spanish EHR. Contrary to expectations, the LLM-driven approach, which necessitates minimal data annotation, performs on par with BERT’s capability to discern complex medical terminologies and contextual nuances within the EHRs. The results of this study highlight a notable advance in the field of NER for Spanish EHRs, with the few shot approach driven by LLM, enhanced by external knowledge, slightly edging out the traditional BERT-based method in overall effectiveness. GPT’s superiority in F-score and its minimal reliance on extensive data annotation underscore its potential in medical data processing.
Keywords
Quality index
Bibliometric impact. Analysis of the contribution and dissemination channel
The work has been published in the journal Multimedia Tools And Applications due to its progression and the good impact it has achieved in recent years, according to the agency Scopus (SJR), it has become a reference in its field. In the year of publication of the work, 2024 there are still no calculated indicators, but in 2023, it was in position , thus managing to position itself as a Q1 (Primer Cuartil), in the category Media Technology.
Independientemente del impacto esperado determinado por el canal de difusión, es importante destacar el impacto real observado de la propia aportación.
Según las diferentes agencias de indexación, el número de citas acumuladas por esta publicación hasta la fecha 2025-06-11:
- Google Scholar: 8
- Scopus: 12
Impact and social visibility
Leadership analysis of institutional authors
This work has been carried out with international collaboration, specifically with researchers from: Colombia.
There is a significant leadership presence as some of the institution’s authors appear as the first or last signer, detailed as follows: First Author (GARCIA BARRAGAN, ALVARO) and Last Author (ROBLES FORCADA, VICTOR).