{rfName}
Ti

APC

1 750,00 Dollars
doaj

License and Use

Icono OpenAccess

Analysis of institutional authors

Mateos-Caballero, AlfonsoAuthor

Share

January 26, 2025
Publications
>
Article

Time Series Classification of Raw Voice Waveforms for Parkinson's Disease Detection Using Generative Adversarial Network-Driven Data Augmentation

Publicated to: IEEE Open Journal of the Computer Society. 6 72-84 - 2025-01-01 6(), DOI: 10.1109/OJCS.2024.3504864

Authors:

Rey-Paredes, Marta; Perez, Carlos J; Mateos-Caballero, Alfonso
[+]

Affiliations

Univ Extremadura, Dept Matemat, Caceres 10003, Spain - Author
Univ Politecn Madrid, Dept Inteligencia Artificial, ETSIINF, Madrid 28660, Spain - Author

Abstract

Parkinson's disease (PD) is a neurodegenerative disorder that affects more than 10 million people worldwide. Despite its prevalence, the detection of PD remains a complicated task, as no gold standard test has yet been developed to provide an accurate diagnosis. In this context, many recent studies have focused on the automatic detection and progression tracking of PD from voice-related characteristics, being feature engineering the most common approach. This work intends to address an existing research gap by introducing a novel strategy that analyzes raw voice waveforms. Despite recent advancements, one of the significant hurdles is still the lack of extensive and diverse datasets. This article also implements a data augmentation solution. Big Vocoder Slicing Adversarial Network (BigVSAN) is used to generate synthetic voice data that mimics the characteristics of real patients and healthy subjects. For the PD detection task, deep learning models such as ResNet, LSTM-FCN, InceptionTime, and CDIL-CNN are used. The experiments were performed using the speech task of sustained vowel /a/ in the PC-GITA database, which contains the recordings of healthy and PD subjects. CDIL-CNN achieves the best results, improving the accuracy by 15.87% (8.96%) compared to the model that does not use augmented data (from the best method found in the literature that uses voice waveforms). The results of this study indicate that models trained with raw waveforms showcase modest but promising performance, underlying the potential of audio analysis to improve the early detection of PD, providing a non-invasive and potentially remotely applicable method.
[+]

Keywords

Cepstral analysisData augmentationData modelsDatabasesDeep learningDiseasesFeature extractionGenerative adversarial networksParkinson's diseaseRecordingSpectrogramTime series analysisVocal signal analysi

Quality index

Bibliometric impact. Analysis of the contribution and dissemination channel

The work has been published in the journal IEEE Open Journal of the Computer Society due to its progression and the good impact it has achieved in recent years, according to the agency WoS (JCR), it has become a reference in its field. In the year of publication of the work, 2025, it was in position 14/258, thus managing to position itself as a Q1 (Primer Cuartil), in the category Computer Science, Information Systems. Notably, the journal is positioned above the 90th percentile.

Independientemente del impacto esperado determinado por el canal de difusión, es importante destacar el impacto real observado de la propia aportación.

Según las diferentes agencias de indexación, el número de citas acumuladas por esta publicación hasta la fecha 2026-04-27:

  • WoS: 5
  • Scopus: 7
[+]

Impact and social visibility

It is essential to present evidence supporting full alignment with institutional principles and guidelines on Open Science and the Conservation and Dissemination of Intellectual Heritage. A clear example of this is:

  • The work has been submitted to a journal whose editorial policy allows open Open Access publication.
  • Assignment of a Handle/URN as an identifier within the deposit in the Institutional Repository: https://oa.upm.es/92509/

As a result of the publication of the work in the institutional repository, statistical usage data has been obtained that reflects its impact. In terms of dissemination, we can state that, as of

  • Views: 45
  • Downloads: 43
[+]

Leadership analysis of institutional authors

There is a significant leadership presence as some of the institution’s authors appear as the first or last signer, detailed as follows: First Author (Rey-Paredes, Marta) and Last Author (MATEOS CABALLERO, ALFONSO).

[+]

Project objectives

Los objetivos perseguidos en esta aportación se centran en avanzar en la detección automática de la enfermedad de Parkinson a partir de señales de voz. Se plantean: analizar las ondas de voz en bruto para superar las limitaciones del enfoque tradicional basado en ingeniería de características; implementar una solución de aumento de datos mediante la red Big Vocoder Slicing Adversarial Network (BigVSAN) para generar datos sintéticos representativos; evaluar el rendimiento de modelos de aprendizaje profundo como ResNet, LSTM-FCN, InceptionTime y CDIL-CNN en la clasificación de voz; comparar la precisión obtenida con y sin aumento de datos, destacando una mejora del 15.87% en precisión con CDIL-CNN; y demostrar el potencial del análisis de audio para la detección temprana no invasiva y remota de Parkinson.
[+]

Most relevant results

Los resultados más relevantes de este estudio se centran en la detección de la enfermedad de Parkinson mediante el análisis de formas de onda vocales crudas y la generación de datos sintéticos. En primer lugar, se implementó Big Vocoder Slicing Adversarial Network (BigVSAN) para la augmentación de datos, generando voces sintéticas que imitan características de pacientes reales y sujetos sanos. En segundo lugar, se evaluaron modelos de aprendizaje profundo como ResNet, LSTM-FCN, InceptionTime y CDIL-CNN para la clasificación de voz en la base de datos PC-GITA. Finalmente, el modelo CDIL-CNN obtuvo el mejor desempeño, mejorando la precisión en un 15.87% con datos aumentados frente a un 8.96% sin augmentación, superando métodos previos basados en formas de onda vocales.
[+]

Awards linked to the item

This work was supported in part by the R&D&I projects under Grant PID2021-122209OB-C31 and Grant PID2021-122209OB-C32 and in part by the MICIU/AEI/10.13039/501100011033/ FEDER, UE.
[+]