Exploring the role of loss functions in multiclass classification
buir.contributor.author | Demirkaya, Ahmet | |
buir.contributor.author | Oymak, Samet | |
dc.contributor.author | Demirkaya, Ahmet | en_US |
dc.contributor.author | Chen, J. | en_US |
dc.contributor.author | Oymak, Samet | en_US |
dc.coverage.spatial | Princeton, NJ, USA | en_US |
dc.date.accessioned | 2021-02-03T13:33:22Z | |
dc.date.available | 2021-02-03T13:33:22Z | |
dc.date.issued | 2020-05 | |
dc.department | Department of Computer Engineering | en_US |
dc.description | Date of Conference: 18-20 March 2020 | en_US |
dc.description | Conference name: 54th Annual Conference on Information Sciences and Systems, CISS 2020 | en_US |
dc.description.abstract | Cross-entropy is the de-facto loss function in modern classification tasks that involve distinguishing hundreds or even thousands of classes. To design better loss functions for new machine learning tasks, it is critical to understand what makes a loss function suitable for a problem. For instance, what makes the cross entropy better than other alternatives such as quadratic loss? In this work, we discuss the role of loss functions in learning tasks with a large number of classes. We hypothesize that different loss functions can have large variability in the difficulty of optimization and that simplicity of training is a key catalyst for better test-time performance. Our intuition draws from the success of over-parameterization in deep learning: As a model has more parameters, it trains faster and achieves higher test accuracy. We argue that, effectively, cross-entropy loss results in a much more over-parameterized problem compared to the quadratic loss, thanks to its emphasis on the correct class (associated with the label). Such over-parameterization drastically simplifies the training process and ends up boosting the test performance. For separable mixture models, we provide a separation result where cross-entropy loss can always achieve small training loss, whereas quadratic loss has diminishing benefit as the number of classes and class correlations increase. Numerical experiments with CIFAR 100 corroborate our results. We show that the accuracy with quadratic loss disproportionately degrades with a growing number of classes; however, encouraging quadratic loss to focus on the correct class results in a drastically improved performance. | en_US |
dc.description.provenance | Submitted by Evrim Ergin (eergin@bilkent.edu.tr) on 2021-02-03T13:33:22Z No. of bitstreams: 1 Exploring_the_role_of_loss_functions_in_multiclass_classification.pdf: 1504552 bytes, checksum: fe0255f2046fb3c00acb5133a314da8d (MD5) | en |
dc.description.provenance | Made available in DSpace on 2021-02-03T13:33:22Z (GMT). No. of bitstreams: 1 Exploring_the_role_of_loss_functions_in_multiclass_classification.pdf: 1504552 bytes, checksum: fe0255f2046fb3c00acb5133a314da8d (MD5) Previous issue date: 2020-05 | en |
dc.identifier.doi | 10.1109/CISS48834.2020.1570627167 | en_US |
dc.identifier.isbn | 9781728140841 | en_US |
dc.identifier.uri | http://hdl.handle.net/11693/54981 | en_US |
dc.language.iso | English | en_US |
dc.publisher | IEEE | en_US |
dc.relation.isversionof | https://dx.doi.org/10.1109/CISS48834.2020.1570627167 | en_US |
dc.source.title | 54th Annual Conference on Information Sciences and Systems, CISS 2020 | en_US |
dc.subject | Cross entropy | en_US |
dc.subject | Multiclass classification | en_US |
dc.subject | Quadratic loss | en_US |
dc.subject | Over-parameterization | en_US |
dc.subject | Deep neural networks | en_US |
dc.title | Exploring the role of loss functions in multiclass classification | en_US |
dc.type | Conference Paper | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Exploring_the_role_of_loss_functions_in_multiclass_classification.pdf
- Size:
- 1.43 MB
- Format:
- Adobe Portable Document Format
- Description:
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed upon to submission
- Description: