Space-filling curves for modeling spatial context in transformer-based whole slide image classification

buir.contributor.authorErkan, Cihan
buir.contributor.authorAksoy, Selim
dc.citation.epage124711L-8en_US
dc.citation.spage124711L-1
dc.contributor.authorErkan, Cihan
dc.contributor.authorAksoy, Selim
dc.coverage.spatialSan Diego, California, United States
dc.date.accessioned2024-03-05T11:31:32Z
dc.date.available2024-03-05T11:31:32Z
dc.date.issued2023-04-06
dc.departmentDepartment of Computer Engineering
dc.descriptionConference Name: Medical Imaging 2023: Digital and Computational Pathology; 124711L (2023)
dc.descriptionDate of Conference: February 19–23, 2023
dc.description.abstractThe common method for histopathology image classification is to sample small patches from large whole slide images and make predictions based on aggregations of patch representations. Transformer models provide a promising alternative with their ability to capture long-range dependencies of patches and their potential to detect representative regions, thanks to their novel self-attention strategy. However, as a sequence-based architecture, transformers are unable to directly capture the two-dimensional nature of images. While it is possible to get around this problem by converting an image into a sequence of patches in raster scan order, the basic transformer architecture is still insensitive to the locations of the patches in the image. The aim of this work is to make the model be aware of the spatial context of the patches as neighboring patches are likely to be part of the same diagnostically relevant structure. We propose a transformer-based whole slide image classification framework that uses space-filling curves to generate patch sequences that are adaptive to the variations in the shapes of the tissue structures. The goal is to preserve the locality of the patches so that neighboring patches in the one-dimensional sequence are closer to each other in the two-dimensional slide. We use positional encodings to capture the spatial arrangements of the patches in these sequences. Experiments using a lung cancer dataset obtained from The Cancer Genome Atlas show that the proposed sequence generation approach that best preserves the locality of the patches achieves 87.6% accuracy, which is higher than baseline models that use raster scan ordering (86.7% accuracy), no ordering (86.3% accuracy), and a model that uses convolutions to relate the neighboring patches (81.7% accuracy).
dc.identifier.doi10.1117/12.2654191
dc.identifier.issn1605-7422
dc.identifier.urihttps://hdl.handle.net/11693/114349
dc.language.isoen
dc.publisherSPIE
dc.relation.isversionofhttps://doi.org/10.1117/12.2654191
dc.source.titleProgress in Biomedical Optics and Imaging - Proceedings of SPIE
dc.subjectDigital pathology
dc.subjectSpace-filling curves
dc.subjectVision transformer
dc.subjectWhole slide image classification
dc.titleSpace-filling curves for modeling spatial context in transformer-based whole slide image classification
dc.typeConference Paper

Files

Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
Space-filling_curves_for_modeling_spatial_context_in_transformer-based_whole_slide_image_classification_poster.pdf
Size:
13.98 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
Space-filling_curves_for_modeling_spatial_context_in_transformer-based_whole_slide_image_classification.pdf
Size:
15.08 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.01 KB
Format:
Item-specific license agreed upon to submission
Description: