Comparative Analysis of A3C and PPO Algorithms in Reinforcement Learning: A Survey on General Environments

Indexed in

License and use

Citations

Cited 13 times in Scopus logo

Cited 1 times in Web of Science logo

Altmetrics

Grant support

This work was supported in part by the Horizon Europe CODECO Project under Grant 101092696, in part by the Horizon Europe NEMO Project under Grant 101070118, and in part by the UNICO-5G I+D (B5GEMINI-AIUC) Project funded by the Ministry of Economic Affairs and Digital Transformation of the Spanish Government and the NextGeneration EU (Recovery, Transformation and Resilience Plan-PRTR) under Grant TSI063000-2021-79.

Analysis of institutional authors

Del Rio, AlbertoCorresponding AuthorJimenez D.AuthorJimenez, DavidAuthorSerrano, JavierAuthor

October 29, 2024

Publications

Article

Sí

Comparative Analysis of A3C and PPO Algorithms in Reinforcement Learning: A Survey on General Environments

Publicated to:Ieee Access. 12 146795-146806 - 2024-01-01 12(), DOI: 10.1109/ACCESS.2024.3472473

Authors: del Rio, A; Jimenez, D; Serrano, J

Affiliations

Univ Politecn Madrid, Escuela Tecn Super Ingn Sistemas Informat ETSISI, Informat Syst Dept, Madrid 28031, Spain - Author

Univ Politecn Madrid, Escuela Tecn Super Ingn Telecomunicac ETSIT, Phys Elect Elect Engn & Appl Phys Dept, Madrid 28040, Spain - Author

Univ Politecn Madrid, Escuela Tecn Super Ingn Telecomunicac ETSIT, Signals Syst & Radiocommun Dept, Madrid 28040, Spain - Author

Abstract

This research article presents a comparison between two mainstream Deep Reinforcement Learning (DRL) algorithms, Asynchronous Advantage Actor-Critic (A3C) and Proximal Policy Optimization (PPO), in the context of two diverse environments: CartPole and Lunar Lander. DRL algorithms are widely known for their effectiveness in training agents to navigate complex environments and achieve optimal policies. Nevertheless, a methodical assessment of their effectiveness in various settings is crucial for comprehending their advantages and disadvantages. In this study, we conduct experiments on the CartPole and Lunar Lander environments using both A3C and PPO algorithms. We compare their performance in terms of convergence speed and stability. Our results indicate that A3C typically achieves quicker training times, but exhibits greater instability in reward values. Conversely, PPO demonstrates a more stable training process at the expense of longer execution times. An evaluation of the environment is needed in terms of algorithm selection, based on specific application needs, balancing between training time and stability. A3C is ideal for applications requiring rapid training, while PPO is better suited for those prioritizing training stability.

Keywords

A3cCartpoleComparisonConvergenceEnvironment complexityHeuristic algorithmsLunar landerMoonPerformance analysisPpoPrediction algorithmsReinforcement learningReliabilitySample efficiencySoftware algorithmsSpace vehiclesStabilitStabilityStability analysisSurveysTraining

Quality index

Bibliometric impact. Analysis of the contribution and dissemination channel

The work has been published in the journal Ieee Access due to its progression and the good impact it has achieved in recent years, according to the agency Scopus (SJR), it has become a reference in its field. In the year of publication of the work, 2024 there are still no calculated indicators, but in 2023, it was in position , thus managing to position itself as a Q1 (Primer Cuartil), in the category Engineering (Miscellaneous).

Independientemente del impacto esperado determinado por el canal de difusión, es importante destacar el impacto real observado de la propia aportación.

Según las diferentes agencias de indexación, el número de citas acumuladas por esta publicación hasta la fecha 2025-07-24:

WoS: 1
Scopus: 13

Impact and social visibility

Leadership analysis of institutional authors

There is a significant leadership presence as some of the institution’s authors appear as the first or last signer, detailed as follows: First Author (DEL RIO PONCE, ALBERTO) and Last Author (SERRANO ROMERO, JAVIER).

the author responsible for correspondence tasks has been DEL RIO PONCE, ALBERTO.

Indexed in

License and use

Citations

Altmetrics

Grant support

Analysis of institutional authors

Share

Comparative Analysis of A3C and PPO Algorithms in Reinforcement Learning: A Survey on General Environments

Affiliations

Abstract

Keywords

Quality index

Bibliometric impact. Analysis of the contribution and dissemination channel

Impact and social visibility

Leadership analysis of institutional authors