User feedback-based online learning for intent classification

Intent classifcation is a key task in natural language processing (NLP) that aims to infer the goal or intention behind a user’s query. Most existing intent classifcation methods rely on supervised deep models trained on large annotated datasets of text-intent pairs. However, obtaining such datasets is often expensive and impractical in real-world settings. Furthermore, supervised models may overft or face distributional shifts when new intents, utterances, or data distributions emerge over time, requiring frequent retraining. Online learning methods based on user feedback can overcome this limitation, as they do not need access to intents while collecting data and adapting the model continuously. In this paper, we propose a novel multi-armed contextual bandit framework that leverages a text encoder based on a large language model (LLM) to extract the latent features of a given utterance and jointly learn multimodal representations of encoded text features and intents. Our framework consists of two stages: ofine pretraining and online fne-tuning. In the ofine stage, we train the policy on a small labeled dataset using a contextual bandit approach. In the online stage, we fne-tune the policy parameters using the REINFORCE algorithm with a user feedback-based objective, without relying on the true intents. We further introduce a sliding window strategy for simulating the retrieval of data samples during online training. This novel two-phase approach enables our method to efciently adapt to dynamic user preferences and data distributions with improved performance. An extensive set of empirical studies indicate that our method signifcantly outperforms policies that omit either offine pretraining or online fne-tuning, while achieving competitive performance to a supervised benchmark trained on an order of magnitude larger labeled dataset.

Source Title

ACM International Conference Proceeding Series

Publisher

Association for Computing Machinery

Keywords

Online learning, Contextual bandits, Intent classifcation, Multimodal learning

Permalink

https://hdl.handle.net/11693/114391

Published Version (Please cite this version)

https://doi.org/10.1145/3577190.3614137

Collections

Scholarly Publications - Computer Engineering
Scholarly Publications - Electrical and Electronics Engineering

Language

English

Type

Conference Paper

Full item page

User feedback-based online learning for intent classification

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

User feedback-based online learning for intent classification

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type