{rfName}
Au

Indexed in

License and use

Citations

1

Altmetrics

Grant support

This work is supported by the Madrid Government (Comunidad de Madrid - Spain) under the Multiannual Agreement with Universidad Politecnica de Madrid in the line Support for R&D projects for Beatriz Galindo researchers, in the context of the VPRICIT, and through the call Research Grants for Young Investigators from Universidad Politecnica de Madrid. The authors would also like to acknowledge European Union's Horizon Europe Programme under GA 101129744 - EVERSE - HORIZON-INFRA-2023-EOSC-01-02.

Analysis of institutional authors

Utrilla Guerrero, CarlosCorresponding AuthorCorcho, OscarAuthorGarijo, DanielAuthor

Share

November 5, 2024
Publications
>
Proceedings Paper
No

Automated Extraction of Research Software Installation Instructions from README Files: An Initial Analysis

Publicated to:Instant Or Distant: A Temporal Network Tale Of Two Interaction Platforms And Their Influence On Collaboration. 14770 114-133 - 2024-01-01 14770(), DOI: 10.1007/978-3-031-65794-8_8

Authors: Guerrero, CU; Corcho, O; Garijo, D

Affiliations

Delft Univ Technol, Res Data & Software RDS Dept, Delft, Netherlands - Author
Univ Politecn Madrid, Ontol Engn Grp, Madrid, Spain - Author

Abstract

Research Software code projects are typically described with a README files, which often contains the steps to set up, test and run the code contained in them. Installation instructions are written in a human-readable manner and therefore are difficult to interpret by intelligent assistants designed to help other researchers setting up a code repository. In this paper we explore this gap by assessing whether Large Language Models (LLMs) are able to extract installation instruction plans from README files. In particular, we define a methodology to extract alternate installation plans, an evaluation framework to assess the effectiveness of each result and an initial quantitative evaluation based on state of the art LLM models (llama-2-70b-chat and Mixtral-8x7b-Instruct-v0.1). Our results show that while LLMs are a promising approach for finding installation instructions, they present important limitations when these instructions are not sequential or mandatory.

Keywords

Automated extractionCodes (symbols)Computer software selection and evaluationData miningInformation extractioInformation extractionInformation retrievalInput output programsKnowledge graphKnowledge graphsLanguage modelLanguage processingModeling languagesNatural language processing systemsNatural scientific language processingResearch/scientific knowledge graphResearch/scientific knowledge graphsScientific knowledgeScientific languageSoftware installationsSoftware testing

Quality index

Bibliometric impact. Analysis of the contribution and dissemination channel

Independientemente del impacto esperado determinado por el canal de difusión, es importante destacar el impacto real observado de la propia aportación.

Según las diferentes agencias de indexación, el número de citas acumuladas por esta publicación hasta la fecha 2025-07-09:

  • Scopus: 1

Impact and social visibility

From the perspective of influence or social adoption, and based on metrics associated with mentions and interactions provided by agencies specializing in calculating the so-called "Alternative or Social Metrics," we can highlight as of 2025-07-09:

  • The use of this contribution in bookmarks, code forks, additions to favorite lists for recurrent reading, as well as general views, indicates that someone is using the publication as a basis for their current work. This may be a notable indicator of future more formal and academic citations. This claim is supported by the result of the "Capture" indicator, which yields a total of: 3 (PlumX).

Leadership analysis of institutional authors

This work has been carried out with international collaboration, specifically with researchers from: Netherlands.

There is a significant leadership presence as some of the institution’s authors appear as the first or last signer, detailed as follows: First Author (UTRILLA GUERRERO, CARLOS) and Last Author (GARIJO VERDEJO, DANIEL).

the author responsible for correspondence tasks has been UTRILLA GUERRERO, CARLOS.