
Indexed in
License and use

Grant support
The article is a result of the project DISNET (Creation and analysis of disease networks for drug repurposing from heterogeneous data sources applied to rare diseases)'' with grant number RTI2018-094576-A-I00'' from the Spanish Ministerio de Ciencia, Innovacion y Universidades. Gerardo Lagunes-Garcia's work is supported by the Mexican Consejo Nacional de Ciencia y Tecnologia (CONACYT) (CVU: 340523) under the programme 291114 -BECAS CONACYT AL EXTRANJERO''. Lucia Prieto-Santamaria's work is supported by Programa de fomento de la investigacion y la innovacion (Doctorados Industriales'') from Comunidad de Madrid (grant IND2019/TIC-17159). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Analysis of institutional authors
Menasalvas Ruiz, ErnestinaAuthorRodriguez Gonzalez, AlejandroAuthorPrieto Santamaria, LuciaAuthorLagunes-Garcia, GAuthorDel Valle E.p.g.AuthorZanin, MAuthorDISNET: a framework for extracting phenotypic disease information from public sources
Publicated to:Peerj. 8 (e8580): e8580- - 2020-02-17 8(e8580), DOI: 10.7717/peerj.8580
Authors: Lagunes-Garcia, Gerardo; Rodriguez-Gonzalez, Alejandro; Prieto-Santamaria, Lucia; Garcia del Valle, Eduardo P; Zanin, Massimiliano; Menasalvas-Ruiz, Ernestina
Affiliations
Abstract
Background. Within the global endeavour of improving population health, one major challenge is the identification and integration of medical knowledge spread through several information sources. The creation of a comprehensive dataset of diseases and their clinical manifestations based on information from public sources is an interesting approach that allows one not only to complement and merge medical knowledge but also to increase it and thereby to interconnect existing data and analyse and relate diseases to each other. In this paper, we present DISNET (http://disnet.ctb.upm.es/), a web-based system designed to periodically extract the knowledge from signs and symptoms retrieved from medical databases, and to enable the creation of customisable disease networks. Methods. We here present the main features of the DISNET system. We describe how information on diseases and their phenotypic manifestations is extracted from Wikipedia and PubMed websites; specifically, texts from these sources are processed through a combination of text mining and natural language processing techniques. Results. We further present the validation of our system on Wikipedia and PubMed texts, obtaining the relevant accuracy. The final output indudes the creation of a comprehensive symptoms-disease dataset, shared (free access) through the system's API. We finally describe, with some simple use cases, how a user can interact with it and extract information that could be used for subsequent analyses. Discussion. DISNET allows retrieving knowledge about the signs, symptoms and diagnostic tests associated with a disease. It is not limited to a specific category (all the categories that the selected sources of information offer us) and clinical diagnosis terms. It further allows to track the evolution of those terms through time, being thus an opportunity to analyse and observe the progress of human knowledge on diseases. We further discussed the validation of the system, suggesting that it is good enough to be used to extract diseases and diagnostically-relevant terms. At the same time, the evaluation also revealed that improvements could be introduced to enhance the system's reliability.
Keywords
Quality index
Bibliometric impact. Analysis of the contribution and dissemination channel
The work has been published in the journal Peerj due to its progression and the good impact it has achieved in recent years, according to the agency Scopus (SJR), it has become a reference in its field. In the year of publication of the work, 2020, it was in position , thus managing to position itself as a Q1 (Primer Cuartil), in the category Agricultural and Biological Sciences (Miscellaneous).
From a relative perspective, and based on the normalized impact indicator calculated from World Citations from Scopus Elsevier, it yields a value for the Field-Weighted Citation Impact from the Scopus agency: 3.02, which indicates that, compared to works in the same discipline and in the same year of publication, it ranks as a work cited above average. (source consulted: ESI Nov 14, 2024)
This information is reinforced by other indicators of the same type, which, although dynamic over time and dependent on the set of average global citations at the time of their calculation, consistently position the work at some point among the top 50% most cited in its field:
- Field Citation Ratio (FCR) from Dimensions: 10.61 (source consulted: Dimensions Jul 2025)
Specifically, and according to different indexing agencies, this work has accumulated citations as of 2025-07-05, the following number of citations:
- WoS: 27
- Scopus: 29
- Europe PMC: 5
- Google Scholar: 36
Impact and social visibility
Leadership analysis of institutional authors
There is a significant leadership presence as some of the institution’s authors appear as the first or last signer, detailed as follows: First Author (LAGUNES GARCÍA, GERARDO) .