ResViT: residual vision transformers for multimodal medical ımage synthesis

buir.contributor.authorDalmaz, Onat
buir.contributor.authorYurt, Mahmut
buir.contributor.authorÇukur, Tolga
buir.contributor.orcidDalmaz, Onat|0000-0001-7978-5311
buir.contributor.orcidÇukur, Tolga|0000-0002-2296-851X
dc.citation.epage2614en_US
dc.citation.issueNumber10en_US
dc.citation.spage2598en_US
dc.citation.volumeNumber41en_US
dc.contributor.authorDalmaz, Onat
dc.contributor.authorYurt, Mahmut
dc.contributor.authorÇukur, Tolga
dc.date.accessioned2023-02-15T12:34:32Z
dc.date.available2023-02-15T12:34:32Z
dc.date.issued2022-04-18
dc.departmentDepartment of Electrical and Electronics Engineeringen_US
dc.departmentNational Magnetic Resonance Research Center (UMRAM)en_US
dc.description.abstractGenerative adversarial models with convolutional neural network (CNN) backbones have recently been established as state-of-the-art in numerous medical image synthesis tasks. However, CNNs are designed to perform local processing with compact filters, and this inductive bias compromises learning of contextual features. Here, we propose a novel generative adversarial approach for medical image synthesis, ResViT, that leverages the contextual sensitivity of vision transformers along with the precision of convolution operators and realism of adversarial learning. ResViT’s generator employs a central bottleneck comprising novel aggregated residual transformer (ART) blocks that synergistically combine residual convolutional and transformer modules. Residual connections in ART blocks promote diversity in captured representations, while a channel compression module distills task-relevant information. A weight sharing strategy is introduced among ART blocks to mitigate computational burden. A unified implementation is introduced to avoid the need to rebuild separate synthesis models for varying source-target modality configurations. Comprehensive demonstrations are performed for synthesizing missing sequences in multi-contrast MRI, and CT images from MRI. Our results indicate superiority of ResViT against competing CNN- and transformer-based methods in terms of qualitative observations and quantitative metrics.en_US
dc.identifier.doi10.1109/TMI.2022.3167808en_US
dc.identifier.eissn1558-254X
dc.identifier.issn02780062
dc.identifier.urihttp://hdl.handle.net/11693/111357
dc.language.isoEnglishen_US
dc.publisherInstitute of Electrical and Electronics Engineers Inc.en_US
dc.relation.isversionofhttps://www.doi.org/10.1109/TMI.2022.3167808en_US
dc.source.titleIEEE Transactions on Medical Imagingen_US
dc.subjectMedical image synthesisen_US
dc.subjectTransformeren_US
dc.subjectResidualen_US
dc.subjectVisionen_US
dc.subjectAdversarialen_US
dc.subjectGenerativeen_US
dc.subjectUnifieden_US
dc.titleResViT: residual vision transformers for multimodal medical ımage synthesisen_US
dc.typeArticleen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ResViT_residual_vision_transformers_for_multimodal_medical_ımage_synthesis.pdf
Size:
14.24 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.69 KB
Format:
Item-specific license agreed upon to submission
Description: