{rfName}
A

License and Use

Icono OpenAccess

Altmetrics

Analysis of institutional authors

Martín-Fernández I.AuthorEsteban-Romero S.AuthorGil-Martín M.AuthorFernández-Martínez F.Author

Share

March 10, 2026
Publications
>
Article
Hybrid Gold

A comprehensive study on contrastive pre-training and fine tuning of vision and text transformers for video memorability prediction

Publicated to: MULTIMEDIA TOOLS AND APPLICATIONS. 85 (1): - 2026-01-01 85(1), DOI: 10.1007/s11042-026-21260-3

Authors:

Martín-Fernández I; Esteban-Romero S; Gil-Martín M; Fernández-Martínez F
[+]

Affiliations

Information Processing and Telecommunications Center (IPTC) - Author

Abstract

Video memorability prediction has emerged as a key challenge for improving information retrieval, content design, and user engagement. Prior work has shown that semantic cues play a crucial role in determining memorability, with recent studies leveraging Contrastive Language-Image Pre-training (CLIP) encoders to incorporate semantic information. However, the specific improvements attributable to CLIP models remain unclear, as few studies systematically compare their performance against equivalent unimodal encoders or explore fine-tuning strategies. This work addresses that gap through a comprehensive, controlled evaluation of CLIP-based and unimodal encoders for video memorability prediction. We propose FCLIP, a domain-adapted extension of CLIP that undergoes additional contrastive pre-training on memorability-specific image-text pairs. Our experiments assess both feature extraction and supervised fine-tuning, ensuring fair comparisons across models with matched architecture and parameter count. Results show that FCLIP image encoders achieve a Spearman Rank Correlation Coefficient (SRCC) of 0.672 on the Memento10k dataset, significantly outperforming unimodal Vision Transformers. FCLIP text encoders similarly outperform unimodal baselines, reaching an SRCC of 0.632. These findings demonstrate that contrastive learning and domain adaptation substantially improve memorability prediction, highlighting the importance of semantic and multimodal pre-training in developing advanced content analysis systems.
[+]

Keywords

Contrastive language image pre-training (clip)Multimodal content analysisSemantic knowledge integrationVideo memorability prediction

Quality index

Bibliometric impact. Analysis of the contribution and dissemination channel

The work has been published in the journal MULTIMEDIA TOOLS AND APPLICATIONS due to its progression and the good impact it has achieved in recent years, according to the agency Scopus (SJR), it has become a reference in its field. In the year of publication of the work, 2026, it was in position , thus managing to position itself as a Q1 (Primer Cuartil), in the category Media Technology.

[+]

Impact and social visibility

From the perspective of influence or social adoption, and based on metrics associated with mentions and interactions provided by agencies specializing in calculating the so-called "Alternative or Social Metrics," we can highlight as of 2026-04-26:

  • The use of this contribution in bookmarks, code forks, additions to favorite lists for recurrent reading, as well as general views, indicates that someone is using the publication as a basis for their current work. This may be a notable indicator of future more formal and academic citations. This claim is supported by the result of the "Capture" indicator, which yields a total of: 2 (PlumX).

It is essential to present evidence supporting full alignment with institutional principles and guidelines on Open Science and the Conservation and Dissemination of Intellectual Heritage. A clear example of this is:

  • The work has been submitted to a journal whose editorial policy allows open Open Access publication.
  • Assignment of a Handle/URN as an identifier within the deposit in the Institutional Repository: https://oa.upm.es/94134/

As a result of the publication of the work in the institutional repository, statistical usage data has been obtained that reflects its impact. In terms of dissemination, we can state that, as of

  • Views: 23
  • Downloads: 12
[+]

Leadership analysis of institutional authors

There is a significant leadership presence as some of the institution’s authors appear as the first or last signer, detailed as follows: First Author (MARTIN FERNANDEZ, IVAN) and Last Author (FERNANDEZ MARTINEZ, FERNANDO).

[+]