Space-filling curves for modeling spatial context in transformer-based whole slide image classification

Erkan, Cihan; Aksoy, Selim

Space-filling curves for modeling spatial context in transformer-based whole slide image classification

buir.contributor.author	Erkan, Cihan
buir.contributor.author	Aksoy, Selim
dc.citation.epage	124711L-8	en_US
dc.citation.spage	124711L-1
dc.contributor.author	Erkan, Cihan
dc.contributor.author	Aksoy, Selim
dc.coverage.spatial	San Diego, California, United States
dc.date.accessioned	2024-03-05T11:31:32Z
dc.date.available	2024-03-05T11:31:32Z
dc.date.issued	2023-04-06
dc.department	Department of Computer Engineering
dc.description	Conference Name: Medical Imaging 2023: Digital and Computational Pathology; 124711L (2023)
dc.description	Date of Conference: February 19–23, 2023
dc.description.abstract	The common method for histopathology image classification is to sample small patches from large whole slide images and make predictions based on aggregations of patch representations. Transformer models provide a promising alternative with their ability to capture long-range dependencies of patches and their potential to detect representative regions, thanks to their novel self-attention strategy. However, as a sequence-based architecture, transformers are unable to directly capture the two-dimensional nature of images. While it is possible to get around this problem by converting an image into a sequence of patches in raster scan order, the basic transformer architecture is still insensitive to the locations of the patches in the image. The aim of this work is to make the model be aware of the spatial context of the patches as neighboring patches are likely to be part of the same diagnostically relevant structure. We propose a transformer-based whole slide image classification framework that uses space-filling curves to generate patch sequences that are adaptive to the variations in the shapes of the tissue structures. The goal is to preserve the locality of the patches so that neighboring patches in the one-dimensional sequence are closer to each other in the two-dimensional slide. We use positional encodings to capture the spatial arrangements of the patches in these sequences. Experiments using a lung cancer dataset obtained from The Cancer Genome Atlas show that the proposed sequence generation approach that best preserves the locality of the patches achieves 87.6% accuracy, which is higher than baseline models that use raster scan ordering (86.7% accuracy), no ordering (86.3% accuracy), and a model that uses convolutions to relate the neighboring patches (81.7% accuracy).
dc.identifier.doi	10.1117/12.2654191	en_US
dc.identifier.issn	1605-7422	en_US
dc.identifier.uri	https://hdl.handle.net/11693/114349	en_US
dc.language.iso	English	en_US
dc.publisher	SPIE	en_US
dc.relation.isversionof	https://doi.org/10.1117/12.2654191
dc.source.title	Progress in Biomedical Optics and Imaging - Proceedings of SPIE
dc.subject	Digital pathology
dc.subject	Space-ﬁlling curves
dc.subject	Vision transformer
dc.subject	Whole slide image classiﬁcation
dc.title	Space-filling curves for modeling spatial context in transformer-based whole slide image classification
dc.type	Conference Paper

Files

Original bundle

Now showing 1 - 2 of 2

Name:: Space-filling_curves_for_modeling_spatial_context_in_transformer-based_whole_slide_image_classification_poster.pdf
Size:: 13.98 MB
Format:: Adobe Portable Document Format

Download

Name:: Space-filling_curves_for_modeling_spatial_context_in_transformer-based_whole_slide_image_classification.pdf
Size:: 15.08 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.01 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Scholarly Publications - Computer Engineering