September 4, 2023
Publications
>
Article

Automatic Detection of Inconsistencies and Hierarchical Topic Classification for Open-Domain Chatbots

Publicated to: Applied Sciences-Basel. 13 (16): 9055- - 2023-08-01 13(16), DOI: 10.3390/app13169055

Authors:

Rodriguez-Cantelar, Mario; Estecha-Garitagoitia, Marcos; D'Haro, Luis Fernando; Matia, Fernando; Cordoba, Ricardo
[+]

Affiliations

Univ Politecn Madrid, Ctr Automat & Robot CAR UPM CSIC, Intelligent Control Grp ICG, C Jose Gutierrez Abascal 2, Madrid 28006, Spain - Author
Univ Politecn Madrid, Speech Technol & Machine Learning Grp THAU, ETSI Telecomunicac, Av Complutense 30, Madrid 28040, Spain - Author
Universidad Politécnica de Madrid - Author
See more

Abstract

Current State-of-the-Art (SotA) chatbots are able to produce high-quality sentences, handling different conversation topics and larger interaction times. Unfortunately, the generated responses depend greatly on the data on which they have been trained, the specific dialogue history and current turn used for guiding the response, the internal decoding mechanisms, and ranking strategies, among others. Therefore, it may happen that for semantically similar questions asked by users, the chatbot may provide a different answer, which can be considered as a form of hallucination or producing confusion in long-term interactions. In this research paper, we propose a novel methodology consisting of two main phases: (a) hierarchical automatic detection of topics and subtopics in dialogue interactions using a zero-shot learning approach, and (b) detecting inconsistent answers using k-means and the Silhouette coefficient. To evaluate the efficacy of topic and subtopic detection, we use a subset of the DailyDialog dataset and real dialogue interactions gathered during the Alexa Socialbot Grand Challenge 5 (SGC5). The proposed approach enables the detection of up to 18 different topics and 102 subtopics. For the purpose of detecting inconsistencies, we manually generate multiple paraphrased questions and employ several pre-trained SotA chatbot models to generate responses. Our experimental results demonstrate a weighted F-1 value of 0.34 for topic detection, a weighted F-1 value of 0.78 for subtopic detection in DailyDialog, then 81% and 62% accuracy for topic and subtopic classification in SGC5, respectively. Finally, to predict the number of different responses, we obtained a mean squared error (MSE) of 3.4 when testing smaller generative models and 4.9 in recent large language models.
[+]

Keywords

clusteringinconsistent responseszero-shot topic detectionChatbotsClusteringInconsistent responsesZero-shot topic detection

Quality index

Bibliometric impact. Analysis of the contribution and dissemination channel

The work has been published in the journal Applied Sciences-Basel due to its progression and the good impact it has achieved in recent years, according to the agency WoS (JCR), it has become a reference in its field. In the year of publication of the work, 2023, it was in position 44/181, thus managing to position itself as a Q1 (Primer Cuartil), in the category Engineering, Multidisciplinary.

From a relative perspective, and based on the normalized impact indicator calculated from World Citations from Scopus Elsevier, it yields a value for the Field-Weighted Citation Impact from the Scopus agency: 1.61, which indicates that, compared to works in the same discipline and in the same year of publication, it ranks as a work cited above average. (source consulted: ESI Nov 13, 2025)

Specifically, and according to different indexing agencies, this work has accumulated citations as of 2026-04-07, the following number of citations:

  • WoS: 5
  • Scopus: 10
  • Google Scholar: 2
[+]

Impact and social visibility

From the perspective of influence or social adoption, and based on metrics associated with mentions and interactions provided by agencies specializing in calculating the so-called "Alternative or Social Metrics," we can highlight as of 2026-04-07:

  • The use, from an academic perspective evidenced by the Altmetric agency indicator referring to aggregations made by the personal bibliographic manager Mendeley, gives us a total of: 23.
  • The use of this contribution in bookmarks, code forks, additions to favorite lists for recurrent reading, as well as general views, indicates that someone is using the publication as a basis for their current work. This may be a notable indicator of future more formal and academic citations. This claim is supported by the result of the "Capture" indicator, which yields a total of: 23 (PlumX).

With a more dissemination-oriented intent and targeting more general audiences, we can observe other more global scores such as:

  • The Total Score from Altmetric: 2.
  • The number of mentions on the social network X (formerly Twitter): 2 (Altmetric).

It is essential to present evidence supporting full alignment with institutional principles and guidelines on Open Science and the Conservation and Dissemination of Intellectual Heritage. A clear example of this is:

  • The work has been submitted to a journal whose editorial policy allows open Open Access publication.
  • Assignment of a Handle/URN as an identifier within the deposit in the Institutional Repository: https://oa.upm.es/76817/

As a result of the publication of the work in the institutional repository, statistical usage data has been obtained that reflects its impact. In terms of dissemination, we can state that, as of

  • Views: 236
  • Downloads: 86
[+]

Leadership analysis of institutional authors

There is a significant leadership presence as some of the institution’s authors appear as the first or last signer, detailed as follows: First Author (RODRIGUEZ CANTELAR, MARIO) and Last Author (CORDOBA HERRALDE, RICARDO DE).

the author responsible for correspondence tasks has been D'HARO ENRIQUEZ, LUIS FERNANDO.

[+]