User feedback-based online learning for intent classification

Date

2023-10-09

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

Source Title

ACM International Conference Proceeding Series

Print ISSN

Electronic ISSN

Publisher

Association for Computing Machinery

Volume

Issue

Pages

613 - 621

Language

en

Journal Title

Journal ISSN

Volume Title

Series

Abstract

Intent classifcation is a key task in natural language processing (NLP) that aims to infer the goal or intention behind a user’s query. Most existing intent classifcation methods rely on supervised deep models trained on large annotated datasets of text-intent pairs. However, obtaining such datasets is often expensive and impractical in real-world settings. Furthermore, supervised models may overft or face distributional shifts when new intents, utterances, or data distributions emerge over time, requiring frequent retraining. Online learning methods based on user feedback can overcome this limitation, as they do not need access to intents while collecting data and adapting the model continuously. In this paper, we propose a novel multi-armed contextual bandit framework that leverages a text encoder based on a large language model (LLM) to extract the latent features of a given utterance and jointly learn multimodal representations of encoded text features and intents. Our framework consists of two stages: ofine pretraining and online fne-tuning. In the ofine stage, we train the policy on a small labeled dataset using a contextual bandit approach. In the online stage, we fne-tune the policy parameters using the REINFORCE algorithm with a user feedback-based objective, without relying on the true intents. We further introduce a sliding window strategy for simulating the retrieval of data samples during online training. This novel two-phase approach enables our method to efciently adapt to dynamic user preferences and data distributions with improved performance. An extensive set of empirical studies indicate that our method signifcantly outperforms policies that omit either offine pretraining or online fne-tuning, while achieving competitive performance to a supervised benchmark trained on an order of magnitude larger labeled dataset.

Course

Other identifiers

Book Title

Degree Discipline

Degree Level

Degree Name

Citation

Published Version (Please cite this version)