Evaluating emotional and subjective responses in synthetic art-related dialogues: A multi-stage framework with large language models

Indexed in

APC

3 030,00 Euros

Elsevier

Transformative agreement with library

License and Use

Citations

Cited 6 times in Scopus logo

Cited 6 times in Web of Science logo

Cited 4 times in Google Scholar logo

Altmetrics

Analysis of institutional authors

Luna-Jimenez, CristinaCorresponding AuthorGil-Martin, ManuelAuthorD'Haro, Luis FernandoAuthorFernandez-Martinez, FernandoAuthorSan-Segundo, RubenAuthor

July 21, 2024

Publications

Article

Hybrid Gold

Evaluating emotional and subjective responses in synthetic art-related dialogues: A multi-stage framework with large language models

Publicated to: EXPERT SYSTEMS WITH APPLICATIONS. 255 124524- - 2024-12-01 255(), DOI: 10.1016/j.eswa.2024.124524

Authors:

Luna-Jimenez, Cristina; Gil-Martin, Manuel; D'Haro, Luis Fernando; Fernandez-Martinez, Fernando; San-Segundo, Ruben

[+]

Affiliations

Univ Politecn Madrid, Grp Tecnol Habla & Aprendizaje Automat THAU Grp, Informat Proc & Telecommun Ctr, ETSI Telecomunicac, Av Complutense 30, Madrid 28040, Spain - Author

Abstract

The appearance of Large Language Models (LLM) has implied a qualitative step forward in the performance of conversational agents, and even in the generation of creative texts. However, previous applications of these models in generating dialogues neglected the impact of 'hallucinations' in the context of generating synthetic dialogues, thus omitting this central aspect in their evaluations. For this reason, we propose an opensource and flexible framework called GenEvalGPT framework: a comprehensive multi-stage evaluation strategy utilizing diverse metrics. The objective is two-fold: first, the goal is to assess the extent to which synthetic dialogues between a chatbot and a human align with the specified commands, determining the successful creation of these dialogues based on the provided specifications; and second, to evaluate various aspects of emotional and subjective responses. Assuming that dialogues to be evaluated were synthetically produced from specific profiles, the first evaluation stage utilizes LLMs to reconstruct the original templates employed in dialogue creation. The success of this reconstruction is then assessed in a second stage using lexical and semantic objective metrics. On the other hand, crafting a chatbot's behaviors demands careful consideration to encompass a diverse range of interactions it is meant to engage in. Synthetic dialogues play a pivotal role in this context, as they can be deliberately synthesized to emulate various behaviors. This is precisely the objective of the third stage: evaluating whether the generated dialogues adhere to the required aspects concerning emotional and subjective responses. To validate the capabilities of the proposed framework, we applied it to recognize whether the chatbot exhibited one of two distinct behaviors in the synthetically generated dialogues: being emotional and providing subjective responses, or remaining neutral. This evaluation will encompass traditional metrics and automatic metrics generated by the LLM. In our use case of art-related dialogues, our findings reveal that the capacity to recover templates or profiles is more effective for information or profile items that are objective and factual, in contrast to those related to mental states or subjective facts. For the emotional and subjective behavior assessment, rule-based metrics achieved a 79% of accuracy in detecting emotions or subjectivity (anthropic), and an 82% on the LLM automatic metrics. The combination of these metrics and stages could help to decide which of the generated dialogues should be maintained depending on the applied policy, which could vary from preserving between 57% to 93% of the initial dialogues.

[+]

Keywords

Affective-computinAffective-computingData and text miningDialogues evaluationDialogues generation

Quality index

Bibliometric impact. Analysis of the contribution and dissemination channel

The work has been published in the journal EXPERT SYSTEMS WITH APPLICATIONS due to its progression and the good impact it has achieved in recent years, according to the agency WoS (JCR), it has become a reference in its field. In the year of publication of the work, 2024 there are still no calculated indicators, but in 2023, it was in position 7/106, thus managing to position itself as a Q1 (Primer Cuartil), in the category Operations Research & Management Science. Notably, the journal is positioned above the 90th percentile.

Independientemente del impacto esperado determinado por el canal de difusión, es importante destacar el impacto real observado de la propia aportación.

Según las diferentes agencias de indexación, el número de citas acumuladas por esta publicación hasta la fecha 2026-04-27:

Google Scholar: 4
WoS: 6
Scopus: 6

[+]

Leadership analysis of institutional authors

There is a significant leadership presence as some of the institution’s authors appear as the first or last signer, detailed as follows: First Author (LUNA JIMENEZ, CRISTINA) and Last Author (SAN SEGUNDO HERNANDEZ, RUBEN).

the author responsible for correspondence tasks has been LUNA JIMENEZ, CRISTINA.

[+]

Awards linked to the item

This work was funded by Project ASTOUND (101071191 - HORIZON-EIC-2021-PATHFINDERCHALLENGES-01) of the European Commission. The work was also supported by the Spanish Ministry of Science and Innovation through the projects GOMINOLA (PID2020-118112RB-C21 and PID2020-118112RB-C22) , AMIC-PoC (PDC2021-120846-C42) and BeWord (PID2021-126061OB-C43) , funded by MCIN/AEI/10.13039/501100011033 and by the European Union "NextGenerationEU/PRTR". We want to give thanks to MS Azure services (and Irving Kwong) for their sponsorship that allowed us to use OpenAI and Azure Cognitive Services for processing the dataset.

[+]

Indexed in

APC

License and Use

Citations

Altmetrics

Analysis of institutional authors

Share

Evaluating emotional and subjective responses in synthetic art-related dialogues: A multi-stage framework with large language models

Affiliations

Abstract

Keywords

Quality index

Bibliometric impact. Analysis of the contribution and dissemination channel

Impact and social visibility

Leadership analysis of institutional authors

Awards linked to the item