Exploring the role of loss functions in multiclass classification

Demirkaya, Ahmet; Chen, J.; Oymak, Samet

Exploring the role of loss functions in multiclass classification

Files

Exploring_the_role_of_loss_functions_in_multiclass_classification.pdf (1.43 MB)

Date

2020-05

Authors

Demirkaya, Ahmet

Chen, J.

Oymak, Samet

BUIR Usage Stats

1
views

95
downloads

Citation Stats

Abstract

Cross-entropy is the de-facto loss function in modern classification tasks that involve distinguishing hundreds or even thousands of classes. To design better loss functions for new machine learning tasks, it is critical to understand what makes a loss function suitable for a problem. For instance, what makes the cross entropy better than other alternatives such as quadratic loss? In this work, we discuss the role of loss functions in learning tasks with a large number of classes. We hypothesize that different loss functions can have large variability in the difficulty of optimization and that simplicity of training is a key catalyst for better test-time performance. Our intuition draws from the success of over-parameterization in deep learning: As a model has more parameters, it trains faster and achieves higher test accuracy. We argue that, effectively, cross-entropy loss results in a much more over-parameterized problem compared to the quadratic loss, thanks to its emphasis on the correct class (associated with the label). Such over-parameterization drastically simplifies the training process and ends up boosting the test performance. For separable mixture models, we provide a separation result where cross-entropy loss can always achieve small training loss, whereas quadratic loss has diminishing benefit as the number of classes and class correlations increase. Numerical experiments with CIFAR 100 corroborate our results. We show that the accuracy with quadratic loss disproportionately degrades with a growing number of classes; however, encouraging quadratic loss to focus on the correct class results in a drastically improved performance.

Source Title

54th Annual Conference on Information Sciences and Systems, CISS 2020

Publisher

IEEE

Keywords

Cross entropy, Multiclass classification, Quadratic loss, Over-parameterization, Deep neural networks

Permalink

http://hdl.handle.net/11693/54981

Published Version (Please cite this version)

https://dx.doi.org/10.1109/CISS48834.2020.1570627167

Collections

Scholarly Publications - Computer Engineering

Language

English

Type

Conference Paper

Full item page

Exploring the role of loss functions in multiclass classification

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

Exploring the role of loss functions in multiclass classification

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type