{rfName}
La

APC

1 973,00 Euros
Springer
Transformative agreement with library

License and Use

Icono OpenAccess

Altmetrics

Analysis of institutional authors

Rodriguez, DavidAuthorDel Alamo, Jose MCorresponding Author

Share

September 15, 2024
Publications
>
Article

Large language models: a new approach for privacy policy analysis at scale

Publicated to: COMPUTING. 106 (12): 3879-3903 - 2024-12-01 106(12), DOI: 10.1007/s00607-024-01331-9

Authors:

Rodriguez, D; Yang, I; Del Alamo, JM; Sadeh, N
[+]

Affiliations

Carnegie Mellon Univ, Sch Comp Sci, Forbes Ave, Pittsburgh, PA 15213 USA - Author
Univ Politecn Madrid, ETSI Telecomunicac, Madrid, Spain - Author

Abstract

The number and dynamic nature of web sites and mobile applications present regulators and app store operators with significant challenges when it comes to enforcing compliance with applicable privacy and data protection laws. Over the past several years, people have turned to Natural Language Processing (NLP) techniques to automate privacy compliance analysis (e.g., comparing statements in privacy policies with analysis of the code and behavior of mobile apps) and to answer people's privacy questions. Traditionally, these NLP techniques have relied on labor-intensive and potentially error-prone manual annotation processes to build the corpora necessary to train them. This article explores and evaluates the use of Large Language Models (LLMs) as an alternative for effectively and efficiently identifying and categorizing a variety of data practice disclosures found in the text of privacy policies. Specifically, we report on the performance of ChatGPT and Llama 2, two particularly popular LLM-based tools. This includes engineering prompts and evaluating different configurations of these LLM techniques. Evaluation of the resulting techniques on well-known corpora of privacy policy annotations yields an F1 score exceeding 93%. This score is higher than scores reported earlier in the literature on these benchmarks. This performance is obtained at minimal marginal cost (excluding the cost required to train the foundational models themselves). These results, which are consistent with those reported in other domains, suggest that LLMs offer a particularly promising approach to automated privacy policy analysis at scale.
[+]

Keywords

68m1168m1468m1568m2568p2768t5068u15Data protectionFeature extractioFeature extractionLarge language modelsNatural language processingPrivacyPrivacy policies

Quality index

Bibliometric impact. Analysis of the contribution and dissemination channel

The work has been published in the journal COMPUTING due to its progression and the good impact it has achieved in recent years, according to the agency Scopus (SJR), it has become a reference in its field. In the year of publication of the work, 2024 there are still no calculated indicators, but in 2023, it was in position , thus managing to position itself as a Q1 (Primer Cuartil), in the category Computational Theory and Mathematics.

Independientemente del impacto esperado determinado por el canal de difusión, es importante destacar el impacto real observado de la propia aportación.

Según las diferentes agencias de indexación, el número de citas acumuladas por esta publicación hasta la fecha 2026-04-27:

  • Google Scholar: 57
  • WoS: 12
  • Scopus: 25
[+]

Impact and social visibility

From the perspective of influence or social adoption, and based on metrics associated with mentions and interactions provided by agencies specializing in calculating the so-called "Alternative or Social Metrics," we can highlight as of 2026-04-27:

  • The use, from an academic perspective evidenced by the Altmetric agency indicator referring to aggregations made by the personal bibliographic manager Mendeley, gives us a total of: 70.

With a more dissemination-oriented intent and targeting more general audiences, we can observe other more global scores such as:

  • The Total Score from Altmetric: 5.
  • The number of mentions on the social network X (formerly Twitter): 2 (Altmetric).
  • The number of mentions on Wikipedia: 1 (Altmetric).

It is essential to present evidence supporting full alignment with institutional principles and guidelines on Open Science and the Conservation and Dissemination of Intellectual Heritage. A clear example of this is:

  • The work has been submitted to a journal whose editorial policy allows open Open Access publication.
  • Assignment of a Handle/URN as an identifier within the deposit in the Institutional Repository: https://oa.upm.es/87559/

As a result of the publication of the work in the institutional repository, statistical usage data has been obtained that reflects its impact. In terms of dissemination, we can state that, as of

  • Views: 162
  • Downloads: 41
[+]

Leadership analysis of institutional authors

This work has been carried out with international collaboration, specifically with researchers from: United States of America.

There is a significant leadership presence as some of the institution’s authors appear as the first or last signer, detailed as follows: First Author (RODRIGUEZ TORRADO, DAVID) .

the author responsible for correspondence tasks has been ALAMO RAMIRO, JOSE MARIA DEL.

[+]

Awards linked to the item

This work has been partially supported by the TED2021-130455A-I00 project funded by MCIN/AEI/10.13039/501,100,011,033 and by the European Union "NextGenerationEU"/PRTR. Jose M. del Alamo has received a grant from the Spanish "Ministerio de Universidades" through the "Movilidad" sub-programme of the "Programa Estatal para Desarrollar, Atraer y Retener Talento", within the "Plan Estatal de Investigacion Cientifica, Tecnica y de Innovacion 2021-2023". This research has also been partially supported by the National Science Foundation under its Security and Trustworthy Computing Program (grant CNS-1914486).
[+]