Automated Extraction of Research Software Installation Instructions from README Files: An Initial Analysis

Indexado en

Licencia y uso

Citaciones

Altmetrics

Grant support

This work is supported by the Madrid Government (Comunidad de Madrid - Spain) under the Multiannual Agreement with Universidad Politecnica de Madrid in the line Support for R&D projects for Beatriz Galindo researchers, in the context of the VPRICIT, and through the call Research Grants for Young Investigators from Universidad Politecnica de Madrid. The authors would also like to acknowledge European Union's Horizon Europe Programme under GA 101129744 - EVERSE - HORIZON-INFRA-2023-EOSC-01-02.

Análisis de autorías institucional

Utrilla Guerrero, CarlosAutor (correspondencia)Corcho, OscarAutor o CoautorGarijo, DanielAutor o Coautor

Publicaciones

Conferencia Publicada

Automated Extraction of Research Software Installation Instructions from README Files: An Initial Analysis

Publicado en:Quality-Of-Service Degradation In Distributed Instrumentation Systems Through Poisoning Of 5g Beamforming Algorithms. 14770 114-133 - 2024-01-01 14770(), DOI: 10.1007/978-3-031-65794-8_8

Autores: Guerrero, CU; Corcho, O; Garijo, D

Afiliaciones

Delft Univ Technol, Res Data & Software RDS Dept, Delft, Netherlands - Autor o Coautor

Univ Politecn Madrid, Ontol Engn Grp, Madrid, Spain - Autor o Coautor

Resumen

Research Software code projects are typically described with a README files, which often contains the steps to set up, test and run the code contained in them. Installation instructions are written in a human-readable manner and therefore are difficult to interpret by intelligent assistants designed to help other researchers setting up a code repository. In this paper we explore this gap by assessing whether Large Language Models (LLMs) are able to extract installation instruction plans from README files. In particular, we define a methodology to extract alternate installation plans, an evaluation framework to assess the effectiveness of each result and an initial quantitative evaluation based on state of the art LLM models (llama-2-70b-chat and Mixtral-8x7b-Instruct-v0.1). Our results show that while LLMs are a promising approach for finding installation instructions, they present important limitations when these instructions are not sequential or mandatory.

Palabras clave

Automated extractionCodes (symbols)Computer software selection and evaluationData miningInformation extractioInformation extractionInformation retrievalInput output programsKnowledge graphKnowledge graphsLanguage modelLanguage processingModeling languagesNatural language processing systemsNatural scientific language processingResearch/scientific knowledge graphResearch/scientific knowledge graphsScientific knowledgeScientific languageSoftware installationsSoftware testing

Indicios de calidad

Impacto y visibilidad social

Análisis de liderazgo de los autores institucionales

Este trabajo se ha realizado con colaboración internacional, concretamente con investigadores de: Netherlands.

Existe un liderazgo significativo ya que algunos de los autores pertenecientes a la institución aparecen como primer o último firmante, se puede apreciar en el detalle: Primer Autor (UTRILLA GUERRERO, CARLOS) y Último Autor (GARIJO VERDEJO, DANIEL).

el autor responsable de establecer las labores de correspondencia ha sido UTRILLA GUERRERO, CARLOS.

Indexado en

Licencia y uso

Citaciones

Altmetrics

Grant support

Análisis de autorías institucional

Compartir

Automated Extraction of Research Software Installation Instructions from README Files: An Initial Analysis

Afiliaciones

Resumen

Palabras clave

Indicios de calidad

Impacto y visibilidad social

Análisis de liderazgo de los autores institucionales