
Indexed in
License and use

Grant support
This work has been funded by the project PCI2022-134990-2 (MARTINI) of the CHISTERA IV Cofund 2021 program, funded by MCIN/AEI/10.13039/501100011033 and by the "European Union NextGenerationEU/PRTR"; by the Spanish Ministry of Science and Innovation under FightDIS (PID2020-117263GB-I00) ; by MCIN/AEI/10.13039/501100011033/and European Union NextGenerationEU/PRTR for XAI-Disinfodemics (PLEC2021-007681) grant, by European Commission under IBERIFIER PlusIberian Digital Media Observatory (DIGITAL-2023-DEPLOY-04-EDMO-HUBS 101158511) ; by "Convenio Plurianual with the Universidad Politecnica de Madrid in the actuation line of Programa de Excelencia para el Profesorado Universitario", and by EMIF managed by the Calouste Gulbenkian Foundation, in the project MuseAI.
Analysis of institutional authors
Huertas-Tato, JavierCorresponding AuthorMartin, AlejandroAuthorCamacho, DavidAuthorUnderstanding writing style in social media with a supervised contrastively pre-trained transformer
Publicated to:Knowledge-Based Systems. 296 111867- - 2024-07-19 296(), DOI: 10.1016/j.knosys.2024.111867
Authors: Huertas-Tato, J; Martín, A; Camacho, D
Affiliations
Abstract
We introduce the Style Transformer for Authorship Representations (STAR) to detect and characterize writing style in social media. The model is trained on a heterogeneous large corpus derived from public sources with 4 . 5 & sdot; 10 6 authored texts from 70k authors leveraging Supervised Contrastive Loss to minimize the distance between texts authored by the same individual. This pretext pre -training task yields competitive performance at zero -shot with PAN challenges on attribution and clustering. We attain promising results on PAN verification challenges using STAR as a feature extractor. Finally, we present results from our test partition on Reddit, where using a support base of 8 documents of 512 tokens, we can discern authors from sets of up to 1616 authors with at least 80% accuracy. We share our pre -trained model at huggingface AIDA-UPM/star and our code is available at jahuerta92/star.
Keywords
Quality index
Bibliometric impact. Analysis of the contribution and dissemination channel
The work has been published in the journal Knowledge-Based Systems due to its progression and the good impact it has achieved in recent years, according to the agency WoS (JCR), it has become a reference in its field. In the year of publication of the work, 2024 there are still no calculated indicators, but in 2023, it was in position 26/204, thus managing to position itself as a Q1 (Primer Cuartil), in the category Computer Science, Artificial Intelligence.
Independientemente del impacto esperado determinado por el canal de difusión, es importante destacar el impacto real observado de la propia aportación.
Según las diferentes agencias de indexación, el número de citas acumuladas por esta publicación hasta la fecha 2025-07-23:
- Google Scholar: 10
- WoS: 2
- Scopus: 6
Impact and social visibility
Leadership analysis of institutional authors
There is a significant leadership presence as some of the institution’s authors appear as the first or last signer, detailed as follows: First Author (HUERTAS TATO, JAVIER) and Last Author (CAMACHO FERNANDEZ, DAVID).
the author responsible for correspondence tasks has been HUERTAS TATO, JAVIER.