Exploring the role of loss functions in multiclass classification

Demirkaya, Ahmet; Chen, J.; Oymak, Samet

Exploring the role of loss functions in multiclass classification

buir.contributor.author	Demirkaya, Ahmet
buir.contributor.author	Oymak, Samet
dc.contributor.author	Demirkaya, Ahmet	en_US
dc.contributor.author	Chen, J.	en_US
dc.contributor.author	Oymak, Samet	en_US
dc.coverage.spatial	Princeton, NJ, USA	en_US
dc.date.accessioned	2021-02-03T13:33:22Z
dc.date.available	2021-02-03T13:33:22Z
dc.date.issued	2020-05
dc.department	Department of Computer Engineering	en_US
dc.description	Date of Conference: 18-20 March 2020	en_US
dc.description	Conference name: 54th Annual Conference on Information Sciences and Systems, CISS 2020	en_US
dc.description.abstract	Cross-entropy is the de-facto loss function in modern classification tasks that involve distinguishing hundreds or even thousands of classes. To design better loss functions for new machine learning tasks, it is critical to understand what makes a loss function suitable for a problem. For instance, what makes the cross entropy better than other alternatives such as quadratic loss? In this work, we discuss the role of loss functions in learning tasks with a large number of classes. We hypothesize that different loss functions can have large variability in the difficulty of optimization and that simplicity of training is a key catalyst for better test-time performance. Our intuition draws from the success of over-parameterization in deep learning: As a model has more parameters, it trains faster and achieves higher test accuracy. We argue that, effectively, cross-entropy loss results in a much more over-parameterized problem compared to the quadratic loss, thanks to its emphasis on the correct class (associated with the label). Such over-parameterization drastically simplifies the training process and ends up boosting the test performance. For separable mixture models, we provide a separation result where cross-entropy loss can always achieve small training loss, whereas quadratic loss has diminishing benefit as the number of classes and class correlations increase. Numerical experiments with CIFAR 100 corroborate our results. We show that the accuracy with quadratic loss disproportionately degrades with a growing number of classes; however, encouraging quadratic loss to focus on the correct class results in a drastically improved performance.	en_US
dc.identifier.doi	10.1109/CISS48834.2020.1570627167	en_US
dc.identifier.isbn	9781728140841	en_US
dc.identifier.uri	http://hdl.handle.net/11693/54981	en_US
dc.language.iso	English	en_US
dc.publisher	IEEE	en_US
dc.relation.isversionof	https://dx.doi.org/10.1109/CISS48834.2020.1570627167	en_US
dc.source.title	54th Annual Conference on Information Sciences and Systems, CISS 2020	en_US
dc.subject	Cross entropy	en_US
dc.subject	Multiclass classification	en_US
dc.subject	Quadratic loss	en_US
dc.subject	Over-parameterization	en_US
dc.subject	Deep neural networks	en_US
dc.title	Exploring the role of loss functions in multiclass classification	en_US
dc.type	Conference Paper	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Exploring_the_role_of_loss_functions_in_multiclass_classification.pdf
Size:: 1.43 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Scholarly Publications - Computer Engineering