{rfName}
Un

License and use

Icono OpenAccess

Altmetrics

Grant support

This work has been funded by the project PCI2022-134990-2 (MARTINI) of the CHISTERA IV Cofund 2021 program, funded by MCIN/AEI/10.13039/501100011033 and by the "European Union NextGenerationEU/PRTR"; by the Spanish Ministry of Science and Innovation under FightDIS (PID2020-117263GB-I00) ; by MCIN/AEI/10.13039/501100011033/and European Union NextGenerationEU/PRTR for XAI-Disinfodemics (PLEC2021-007681) grant, by European Commission under IBERIFIER PlusIberian Digital Media Observatory (DIGITAL-2023-DEPLOY-04-EDMO-HUBS 101158511) ; by "Convenio Plurianual with the Universidad Politecnica de Madrid in the actuation line of Programa de Excelencia para el Profesorado Universitario", and by EMIF managed by the Calouste Gulbenkian Foundation, in the project MuseAI.

Analysis of institutional authors

Huertas-Tato, JavierCorresponding AuthorMartin, AlejandroAuthorCamacho, DavidAuthor

Share

June 18, 2024
Publications
>
Article

Understanding writing style in social media with a supervised contrastively pre-trained transformer

Publicated to:Knowledge-Based Systems. 296 111867- - 2024-07-19 296(), DOI: 10.1016/j.knosys.2024.111867

Authors: Huertas-Tato, J; Martín, A; Camacho, D

Affiliations

Univ Politecn Madrid, Dept Sistemas Informat, Calle Alan Turing S-N, Madrid 28031, Spain - Author

Abstract

We introduce the Style Transformer for Authorship Representations (STAR) to detect and characterize writing style in social media. The model is trained on a heterogeneous large corpus derived from public sources with 4 . 5 & sdot; 10 6 authored texts from 70k authors leveraging Supervised Contrastive Loss to minimize the distance between texts authored by the same individual. This pretext pre -training task yields competitive performance at zero -shot with PAN challenges on attribution and clustering. We attain promising results on PAN verification challenges using STAR as a feature extractor. Finally, we present results from our test partition on Reddit, where using a support base of 8 documents of 512 tokens, we can discern authors from sets of up to 1616 authors with at least 80% accuracy. We share our pre -trained model at huggingface AIDA-UPM/star and our code is available at jahuerta92/star.

Keywords

Authorship attributioAuthorship attributionCompetitive performanceContrastive learningLanguage processingLarge corporaNatural language processingNatural language processing systemsNatural languagesPre-trainingSocial mediaSocial networking (online)StarsWriting styleZero-shot learning

Quality index

Bibliometric impact. Analysis of the contribution and dissemination channel

The work has been published in the journal Knowledge-Based Systems due to its progression and the good impact it has achieved in recent years, according to the agency WoS (JCR), it has become a reference in its field. In the year of publication of the work, 2024 there are still no calculated indicators, but in 2023, it was in position 26/204, thus managing to position itself as a Q1 (Primer Cuartil), in the category Computer Science, Artificial Intelligence.

Independientemente del impacto esperado determinado por el canal de difusión, es importante destacar el impacto real observado de la propia aportación.

Según las diferentes agencias de indexación, el número de citas acumuladas por esta publicación hasta la fecha 2025-07-23:

  • Google Scholar: 10
  • WoS: 2
  • Scopus: 6

Impact and social visibility

From the perspective of influence or social adoption, and based on metrics associated with mentions and interactions provided by agencies specializing in calculating the so-called "Alternative or Social Metrics," we can highlight as of 2025-07-23:

  • The use of this contribution in bookmarks, code forks, additions to favorite lists for recurrent reading, as well as general views, indicates that someone is using the publication as a basis for their current work. This may be a notable indicator of future more formal and academic citations. This claim is supported by the result of the "Capture" indicator, which yields a total of: 18 (PlumX).

It is essential to present evidence supporting full alignment with institutional principles and guidelines on Open Science and the Conservation and Dissemination of Intellectual Heritage. A clear example of this is:

  • The work has been submitted to a journal whose editorial policy allows open Open Access publication.
  • Assignment of a Handle/URN as an identifier within the deposit in the Institutional Repository: https://oa.upm.es/84340/

As a result of the publication of the work in the institutional repository, statistical usage data has been obtained that reflects its impact. In terms of dissemination, we can state that, as of

  • Views: 53
  • Downloads: 20

Leadership analysis of institutional authors

There is a significant leadership presence as some of the institution’s authors appear as the first or last signer, detailed as follows: First Author (HUERTAS TATO, JAVIER) and Last Author (CAMACHO FERNANDEZ, DAVID).

the author responsible for correspondence tasks has been HUERTAS TATO, JAVIER.