{rfName}
Wi

Altmetrics

Analysis of institutional authors

Calleja PAuthorSanchez AAuthorCorcho OAuthor

Share

May 1, 2023
Publications
>
Article
No

Widaug. Data augmentation for named entity recognition using Wikidata

Publicated to: Procesamiento de Lenguaje Natural. (70): 145-155 - 2023-03-01 (70), DOI: 10.26342/2023-70-12

Authors:

Calleja, P; Sánchez, A; Corcho, O
[+]

Affiliations

Univ Politecn Madrid, Ontol Engn Grp, Madrid, Spain - Author
Universidad Politécnica de Madrid - Author

Abstract

The current state of the art of Natural Language Processing models are based on the use of a big amount of data to be trained. The more, the better. However, this is quite a limitation in the creation of datasets for specific natural language processing tasks such as Named Entity Recognition, which involves one or more annotators to read, understand and annotate those required named entities along a corpus. Currently, there are many good general domain corpora for the English language. However, particular domains or scenarios and other non-English languages are still not so represented in the research community. Thus, data augmentation techniques are explored to create synthetic data similar to the originals to enrich the training process of the models. On the other hand, knowledge graphs contain a lot of valuable information that is not being used to help in the data augmentation process. This work proposes a data augmentation method based on the Wikidata knowledge graph which is tested in a Spanish corpus for a Named Entity Recognition challenge.
[+]

Keywords

data augmentationwikidataData augmentationNamed entity recognitionWikidata

Quality index

Bibliometric impact. Analysis of the contribution and dissemination channel

The work has been published in the journal Procesamiento de Lenguaje Natural due to its progression and the good impact it has achieved in recent years, according to the agency Scopus (SJR), it has become a reference in its field. In the year of publication of the work, 2023, it was in position , thus managing to position itself as a Q1 (Primer Cuartil), in the category Linguistics and Language.

Independientemente del impacto esperado determinado por el canal de difusión, es importante destacar el impacto real observado de la propia aportación.

Según las diferentes agencias de indexación, el número de citas acumuladas por esta publicación hasta la fecha 2026-04-27:

  • WoS: 1
  • Scopus: 1
[+]

Impact and social visibility

From the perspective of influence or social adoption, and based on metrics associated with mentions and interactions provided by agencies specializing in calculating the so-called "Alternative or Social Metrics," we can highlight as of 2026-04-27:

  • The use of this contribution in bookmarks, code forks, additions to favorite lists for recurrent reading, as well as general views, indicates that someone is using the publication as a basis for their current work. This may be a notable indicator of future more formal and academic citations. This claim is supported by the result of the "Capture" indicator, which yields a total of: 3 (PlumX).

It is essential to present evidence supporting full alignment with institutional principles and guidelines on Open Science and the Conservation and Dissemination of Intellectual Heritage. A clear example of this is:

  • Assignment of a Handle/URN as an identifier within the deposit in the Institutional Repository: https://oa.upm.es/86404/

As a result of the publication of the work in the institutional repository, statistical usage data has been obtained that reflects its impact. In terms of dissemination, we can state that, as of

  • Views: 121
  • Downloads: 108
[+]

Leadership analysis of institutional authors

There is a significant leadership presence as some of the institution’s authors appear as the first or last signer, detailed as follows: First Author (Calleja Ibañez, Pablo) and Last Author (CORCHO GARCIA, OSCAR).

[+]