{rfName}
Te

Indexed in

Altmetrics

Analysis of institutional authors

Liz López, HelenaAdapterPanizo-Lledot ACorresponding AuthorCamacho DAuthor

Share

October 2, 2023
Publications
>
Article
No

Testing the performance, adequacy, and applicability of an artificial intelligence model for pediatric pneumonia diagnosis

Publicated to: Computer Methods And Programs In Biomedicine. 242 107765- - 2023-12-01 242(), DOI: 10.1016/j.cmpb.2023.107765

Authors:

Domínguez-Rodríguez, S; Liz-López, H; Panizo-LLedot, A; Ballesteros, A; Dagan, R; Greenberg, D; Gutiérrez, L; Rojo, P; Otheo, E; Galán, JC; Villanueva, S; Garcia, S; Mosquera, P; Tagarro, A; Moraleda, C; Camacho, D
[+]

Affiliations

Ben Gurion Univ Negev, Fac Hlth Sci, Beer Sheva, Israel - Author
Ben-Gurion University of the Negev - Author
Fdn Invest Biomed Hosp Octubre 12, Inst Invest Sanitaria Hosp 12 Octubre imas12, Pediat Res & Clin Trials Unit UPIC, Madrid, Spain - Author
Hosp Univ 12 Octubre, Dept Pediat, Pediat Infect Dis Unit, Madrid, Spain - Author
Hosp Univ Henares, Madrid, Spain - Author
Hosp Univ Infanta Sofia, Fdn Invest Innovac Biomed, Madrid, Spain - Author
Hosp Univ Ramon Y Cajal, Microbiol Dept, Madrid, Spain - Author
Hosp Univ Ramon Y Cajal, Pediat Dept, Madrid, Spain - Author
Hospital Universitario 12 de octubre - Author
Hospital Universitario Infanta Sofía , Universidad Europea de Madrid , Hospital Universitario 12 de Octubre - Author
Hospital Universitario Ramón y Cajal - Author
Soroka Univ, Med Ctr, Beer Sheva, Israel - Author
Soroka University Medical Center , Ben-Gurion University of the Negev - Author
Univ Europea Madrid, Pediat Res Grp, Pediat, Madrid, Spain - Author
Univ Politecn Madrid, Comp Syst Engn Dept, Madrid, Spain - Author
Universidad Politécnica de Madrid - Author
See more

Abstract

Background: Community-acquired Pneumonia (CAP) is a common childhood infectious disease. Deep learning models show promise in X-ray interpretation and diagnosis, but their validation should be extended due to limitations in the current validation workflow. To extend the standard validation workflow we propose doing a pilot test with the next characteristics. First, the assumption of perfect ground truth (100% sensitive and specific) is unrealistic, as high intra and inter-observer variability have been reported. To address this, we propose using Bayesian latent class models (BLCA) to estimate accuracy during the pilot. Additionally, assessing only the performance of a model without considering its applicability and acceptance by physicians is insufficient if we hope to integrate AI systems into day-to-day clinical practice. Therefore, we propose employing explainable artificial intelligence (XAI) methods during the pilot test to involve physicians and evaluate how well a Deep Learning model is accepted and how helpful it is for routine decisions as well as analyze its limitations by assessing the etiology. This study aims to apply the proposed pilot to test a deep Convolutional Neural Network (CNN)-based model for identifying consolidation in pediatric chest-X-ray (CXR) images already validated using the standard workflow. Methods: For the standard validation workflow, a total of 5856 public CXRs and 950 private CXRs were used to train and validate the performance of the CNN model. The performance of the model was estimated assuming a perfect ground truth. For the pilot test proposed in this article, a total of 190 pediatric chest-X-ray (CXRs) images were used to test the CNN model support decision tool (SDT). The performance of the model on the pilot test was estimated using extensions of the two-test Bayesian Latent-Class model (BLCA). The sensitivity, specificity, and accuracy of the model were also assessed. The clinical characteristics of the patients were compared according to the model performance. The adequacy and applicability of the SDT was tested using XAI techniques. The adequacy of the SDT was assessed by asking two senior physicians the agreement rate with the SDT. The applicability was tested by asking three medical residents before and after using the SDT and the agreement between experts was calculated using the kappa index. Results: The CRXs of the pilot test were labeled by the panel of experts into consolidation (124/176, 70.4%) and no-consolidation/other infiltrates (52/176, 29.5%). A total of 31/176 (17.6%) discrepancies were found between the model and the panel of experts with a kappa index of 0.6. The sensitivity and specificity reached a median of 90.9 (95% Credible Interval (CrI), 81.2–99.9) and 77.7 (95% CrI, 63.3–98.1), respectively. The senior physicians reported a high agreement rate (70%) with the system in identifying logical consolidation patterns. The three medical residents reached a higher agreement using SDT than alone with experts (0.66±0.1 vs. 0.75±0.2). Conclusions: Through the pilot test, we have successfully verified that the deep learning model was underestimated when a perfect ground truth was considered. Furthermore, by conducting adequacy and applicability tests, we can ensure that the model is able to identify logical patterns within the CXRs and that augmenting clinicians with automated preliminary read assistants could accelerate their workflows and enhance accuracy in identifying consolidation in pediatric CXR images.
[+]

Keywords

absencechest x-raycnnsdeep-learningreliabilityscreening-testvariabilityArtificial intelligenceBayes theoremChest radiographsChest x-rayChildCnnsDeep learningDeep-learningHumansLung diseasesNeural networks, computerPneumonia

Quality index

Bibliometric impact. Analysis of the contribution and dissemination channel

The work has been published in the journal Computer Methods And Programs In Biomedicine due to its progression and the good impact it has achieved in recent years, according to the agency WoS (JCR), it has become a reference in its field. In the year of publication of the work, 2023, it was in position 20/144, thus managing to position itself as a Q1 (Primer Cuartil), in the category Computer Science, Theory & Methods.

Independientemente del impacto esperado determinado por el canal de difusión, es importante destacar el impacto real observado de la propia aportación.

Según las diferentes agencias de indexación, el número de citas acumuladas por esta publicación hasta la fecha 2025-12-21:

  • WoS: 3
  • Scopus: 7
[+]

Impact and social visibility

From the perspective of influence or social adoption, and based on metrics associated with mentions and interactions provided by agencies specializing in calculating the so-called "Alternative or Social Metrics," we can highlight as of 2025-12-21:

  • The use, from an academic perspective evidenced by the Altmetric agency indicator referring to aggregations made by the personal bibliographic manager Mendeley, gives us a total of: 41.
  • The use of this contribution in bookmarks, code forks, additions to favorite lists for recurrent reading, as well as general views, indicates that someone is using the publication as a basis for their current work. This may be a notable indicator of future more formal and academic citations. This claim is supported by the result of the "Capture" indicator, which yields a total of: 41 (PlumX).

With a more dissemination-oriented intent and targeting more general audiences, we can observe other more global scores such as:

  • The Total Score from Altmetric: 3.
  • The number of mentions on the social network X (formerly Twitter): 5 (Altmetric).
[+]

Leadership analysis of institutional authors

This work has been carried out with international collaboration, specifically with researchers from: Israel.

There is a significant leadership presence as some of the institution’s authors appear as the first or last signer, detailed as follows: Last Author (CAMACHO FERNANDEZ, DAVID).

the author responsible for correspondence tasks has been PANIZO LLEDOT, ANGEL.

[+]