Exploring the role of loss functions in multiclass classification

buir.contributor.authorDemirkaya, Ahmet
buir.contributor.authorOymak, Samet
dc.contributor.authorDemirkaya, Ahmeten_US
dc.contributor.authorChen, J.en_US
dc.contributor.authorOymak, Sameten_US
dc.coverage.spatialPrinceton, NJ, USAen_US
dc.date.accessioned2021-02-03T13:33:22Z
dc.date.available2021-02-03T13:33:22Z
dc.date.issued2020-05
dc.departmentDepartment of Computer Engineeringen_US
dc.descriptionDate of Conference: 18-20 March 2020en_US
dc.descriptionConference name: 54th Annual Conference on Information Sciences and Systems, CISS 2020en_US
dc.description.abstractCross-entropy is the de-facto loss function in modern classification tasks that involve distinguishing hundreds or even thousands of classes. To design better loss functions for new machine learning tasks, it is critical to understand what makes a loss function suitable for a problem. For instance, what makes the cross entropy better than other alternatives such as quadratic loss? In this work, we discuss the role of loss functions in learning tasks with a large number of classes. We hypothesize that different loss functions can have large variability in the difficulty of optimization and that simplicity of training is a key catalyst for better test-time performance. Our intuition draws from the success of over-parameterization in deep learning: As a model has more parameters, it trains faster and achieves higher test accuracy. We argue that, effectively, cross-entropy loss results in a much more over-parameterized problem compared to the quadratic loss, thanks to its emphasis on the correct class (associated with the label). Such over-parameterization drastically simplifies the training process and ends up boosting the test performance. For separable mixture models, we provide a separation result where cross-entropy loss can always achieve small training loss, whereas quadratic loss has diminishing benefit as the number of classes and class correlations increase. Numerical experiments with CIFAR 100 corroborate our results. We show that the accuracy with quadratic loss disproportionately degrades with a growing number of classes; however, encouraging quadratic loss to focus on the correct class results in a drastically improved performance.en_US
dc.description.provenanceSubmitted by Evrim Ergin (eergin@bilkent.edu.tr) on 2021-02-03T13:33:22Z No. of bitstreams: 1 Exploring_the_role_of_loss_functions_in_multiclass_classification.pdf: 1504552 bytes, checksum: fe0255f2046fb3c00acb5133a314da8d (MD5)en
dc.description.provenanceMade available in DSpace on 2021-02-03T13:33:22Z (GMT). No. of bitstreams: 1 Exploring_the_role_of_loss_functions_in_multiclass_classification.pdf: 1504552 bytes, checksum: fe0255f2046fb3c00acb5133a314da8d (MD5) Previous issue date: 2020-05en
dc.identifier.doi10.1109/CISS48834.2020.1570627167en_US
dc.identifier.isbn9781728140841en_US
dc.identifier.urihttp://hdl.handle.net/11693/54981en_US
dc.language.isoEnglishen_US
dc.publisherIEEEen_US
dc.relation.isversionofhttps://dx.doi.org/10.1109/CISS48834.2020.1570627167en_US
dc.source.title54th Annual Conference on Information Sciences and Systems, CISS 2020en_US
dc.subjectCross entropyen_US
dc.subjectMulticlass classificationen_US
dc.subjectQuadratic lossen_US
dc.subjectOver-parameterizationen_US
dc.subjectDeep neural networksen_US
dc.titleExploring the role of loss functions in multiclass classificationen_US
dc.typeConference Paperen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Exploring_the_role_of_loss_functions_in_multiclass_classification.pdf
Size:
1.43 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: