Browsing by Subject "Transformer"

Now showing 1 - 13 of 13

Open Access
A transformer-based real-time focus detection technique for wide-field interferometric microscopy
(IEEE - Institute of Electrical and Electronics Engineers, 2023-08-28) Polat, Can; Güngör, A.; Yorulmaz, M.; Kızılelma, B.; Çukur, Tolga
Wide-field interferometric microscopy (WIM) has been utilized for visualization of individual biological nanoparticles with high sensitivity. However, the image quality is highly affected by the focusing of the image. Hence, focus detection has been an active research field within the scope of imaging and microscopy. To tackle this issue, we propose a novel convolution and transformer based deep learning technique to detect focus in WIM. The method is compared to other focus detecton techniques and is able to obtain higher precision with less number of parameters. Furthermore, the model achieves real-time focus detection thanks to its low inference time.
Open Access
Analysis of gender bias in legal texts using natural language processing methods
(2023-07) Sevim, Nurullah
Word embeddings have become important building blocks that are used profoundly in natural language processing (NLP). Despite their several advantages, word embed-dings can unintentionally accommodate some gender- and ethnicity-based biases that are present within the corpora they are trained on. Therefore, ethical concerns have been raised since word embeddings are extensively used in several high level algorithms. Furthermore, transformer-based contextualized language models constitute the state-of-the-art in several natural language processing (NLP) tasks and applications. Despite their utility, contextualized models can contain human-like social biases as their training corpora generally consist of human-generated text. Evaluating and re-moving social biases in NLP models have been an ongoing and prominent research endeavor. In parallel, the NLP approaches in the legal area, namely legal NLP or computational law, have also been increasing recently. Eliminating unwanted bias in the legal domain is doubly crucial since the law has the utmost importance and effect on people. We approach the gender bias problem from the scope of legal text processing domain. In the first stage of our study, we focus on the gender bias in traditional word embeddings, like Word2Vec and GloVe. Word embedding models which are trained on corpora composed by legal documents and legislation from different countries have been utilized to measure and eliminate gender bias in legal documents. Several methods have been employed to reveal the degree of gender bias and observe its variations over countries. Moreover, a debiasing method has been used to neutralize unwanted bias. The preservation of semantic coherence of the debiased vector space has also been demonstrated by using high level tasks. In the second stage, we study the gender bias encoded in BERT-based models. We propose a new template-based bias measurement method with a bias evaluation corpus using crime words from the FBI database. This method quantifies the gender bias present in BERT-based models for legal applications. Furthermore, we propose a fine-tuning-based debiasing method using the European Court of Human Rights (ECtHR) corpus to debias legal pre-trained models. We test the debiased models on the LexGLUE benchmark to confirm that the under-lying semantic vector space is not perturbed during the debiasing process. Finally, overall results and their implications have been discussed in the scope of NLP in legal domain.
Open Access
BolT: Fused window transformers for fMRI time series analysis
(Elsevier B.V., 2023-05-18) Bedel, Hasan Atakan; Şıvgın, Irmak; Dalmaz, Onat; Ul Hassan Dar, Salman ; Çukur, Tolga
Deep-learning models have enabled performance leaps in analysis of high-dimensional functional MRI (fMRI) data. Yet, many previous methods are suboptimally sensitive for contextual representations across diverse time scales. Here, we present BolT, a blood-oxygen-level-dependent transformer model, for analyzing multi-variate fMRI time series. BolT leverages a cascade of transformer encoders equipped with a novel fused window attention mechanism. Encoding is performed on temporally-overlapped windows within the time series to capture local representations. To integrate information temporally, cross-window attention is computed between base tokens in each window and fringe tokens from neighboring windows. To gradually transition from local to global representations, the extent of window overlap and thereby number of fringe tokens are progressively increased across the cascade. Finally, a novel cross-window regularization is employed to align high-level classification features across the time series. Comprehensive experiments on large-scale public datasets demonstrate the superior performance of BolT against state-of-the-art methods. Furthermore, explanatory analyses to identify landmark time points and regions that contribute most significantly to model decisions corroborate prominent neuroscientific findings in the literature.
Open Access
COVID-19 Detection from respiratory sounds with hierarchical spectrogram transformers
(Institute of Electrical and Electronics Engineers , 2023-12-05) Aytekin, Ayçe İdil; Dalmaz, Onat; Gönç, Kaan; Ankishan, H.; Sarıtaş, Emine Ülkü; Bağcı, U.; Çelik, H.; Çukur, Tolga
Monitoring of prevalent airborne diseases such as COVID-19 characteristically involves respiratory assessments. While auscultation is a mainstream method for preliminary screening of disease symptoms, its util ity is hampered by the need for dedicated hospital visits. Remote monitoring based on recordings of respi ratory sounds on portable devices is a promising alter native, which can assist in early assessment of COVID-19 that primarily affects the lower respiratory tract. In this study, we introduce a novel deep learning approach to distinguish patients with COVID-19 from healthy controls given audio recordings of cough or breathing sounds. The proposed approach leverages a novel hierarchical spectro gram transformer (HST) on spectrogram representations of respiratory sounds. HST embodies self-attention mech anisms over local windows in spectrograms, and window size is progressively grown over model stages to capture local to global context. HST is compared against state-of the-art conventional and deep-learning baselines. Demon strations on crowd-sourced multi-national datasets indicate that HST outperforms competing methods, achieving over 90% area under the receiver operating characteristic curve (AUC) in detecting COVID-19 cases.
Open Access
Deep MRI reconstruction with generative vision transformer
(Springer, 2021) Korkmaz, Yılmaz; Yurt, Mahmut; Dar, Salman Ul Hassan; Özbey, Muzaffer; Çukur, Tolga
Supervised training of deep network models for MRI reconstruction requires access to large databases of fully-sampled MRI acquisitions. To alleviate dependency on costly databases, unsupervised learning strategies have received interest. A powerful framework that eliminates the need for training data altogether is the deep image prior (DIP). To do this, DIP inverts randomly-initialized models to infer network parameters most consistent with the undersampled test data. However, existing DIP methods leverage convolutional backbones, suffering from limited sensitivity to long-range spatial dependencies and thereby poor model invertibility. To address these limitations, here we propose an unsupervised MRI reconstruction based on a novel generative vision transformer (GVTrans). GVTrans progressively maps low-dimensional noise and latent variables onto MR images via cascaded blocks of cross-attention vision transformers. Cross-attention mechanism between latents and image features serve to enhance representational learning of local and global context. Meanwhile, latent and noise injections at each network layer permit fine control of generated image features, improving model invertibility. Demonstrations are performed for scan-specific reconstruction of brain MRI data at multiple contrasts and acceleration factors. GVTrans yields superior performance to state-of-the-art generative models based on convolutional neural networks (CNNs).
Open Access
Detecting COVID-19 from respiratory sound recordings with transformers
(S P I E - International Society for Optical Engineering, 2022-04-04) Aytekin, İdil; Dalmaz, Onat; Ankishan, Haydar; Sarıtaş, Emine Ü.; Bağcı, Ulaş; Çukur, Tolga; Çelik, Haydar
Auscultation is an established technique in clinical assessment of symptoms for respiratory disorders. Auscultation is safe and inexpensive, but requires expertise to diagnose a disease using a stethoscope during hospital or office visits. However, some clinical scenarios require continuous monitoring and automated analysis of respiratory sounds to pre-screen and monitor diseases, such as the rapidly spreading COVID-19. Recent studies suggest that audio recordings of bodily sounds captured by mobile devices might carry features helpful to distinguish patients with COVID-19 from healthy controls. Here, we propose a novel deep learning technique to automatically detect COVID-19 patients based on brief audio recordings of their cough and breathing sounds. The proposed technique first extracts spectrogram features of respiratory recordings, and then classifies disease state via a hierarchical vision transformer architecture. Demonstrations are provided on a crowdsourced database of respiratory sounds from COVID-19 patients and healthy controls. The proposed transformer model is compared against alternative methods based on state-of-the-art convolutional and transformer architectures, as well as traditional machine-learning classifiers. Our results indicate that the proposed model achieves on par or superior performance to competing methods. In particular, the proposed technique can distinguish COVID-19 patients from healthy subjects with over 94% AUC.
Open Access
Employing transformer encoders for enhanced functional connectivity mapping
(IEEE - Institute of Electrical and Electronics Engineers, 2023-08-28) Bedel, Hasan Atakan; Çukur, Tolga
Functional magnetic resonance imaging (fMRI) provides a way to spatially and temporally map brain activity, making it a crucial tool in many advanced psychology and neuroscience studies. A variety of techniques are suggested to analyze the four-dimensional data produced by fMRI scans. When it comes to classification tasks, the most prevalent method involves examining functional connectivity. This process involves dividing the brain volume into separate regions and determining the correlation between the series of events occurring over time in these regions. While deep graph models and deep convolutional models are frequently employed to process functional connectivity, these methods can sometimes overcomplicate the procedure. In contrast, we present a straightforward approach that utilizes transformer encoders to map functional connectivity to labels. Our method demonstrates superior performance in gender classification tasks when compared to existing deep graph and convolution models. We've validated this on two publicly accessible datasets.
Open Access
Focal modulation network for lung segmentation in chest X-ray images
(2023-08-09) Öztürk, Şaban; Çukur, Tolga
Segmentation of lung regions is of key importance for the automatic analysis of Chest X-Ray (CXR) images, which have a vital role in the detection of various pulmonary diseases. Precise identification of lung regions is the basic prerequisite for disease diagnosis and treatment planning. However, achieving precise lung segmentation poses significant challenges due to factors such as variations in anatomical shape and size, the presence of strong edges at the rib cage and clavicle, and overlapping anatomical structures resulting from diverse diseases. Although commonly considered as the de-facto standard in medical image segmentation, the convolutional UNet architecture and its variants fall short in addressing these challenges, primarily due to the limited ability to model long-range dependencies between image features. While vision transformers equipped with self-attention mechanisms excel at capturing long-range relationships, either a coarse-grained global self-attention or a fine-grained local self-attention is typically adopted for segmentation tasks on high-resolution images to alleviate quadratic computational cost at the expense of performance loss. This paper introduces a focal modulation UNet model (FMN-UNet) to enhance segmentation performance by effectively aggregating fine-grained local and coarse-grained global relations at a reasonable computational cost. FMN-UNet first encodes CXR images via a convolutional encoder to suppress background regions and extract latent feature maps at a relatively modest resolution. FMN-UNet then leverages global and local attention mechanisms to model contextual relationships across the images. These contextual feature maps are convolutionally decoded to produce segmentation masks. The segmentation performance of FMN-UNet is compared against state-of-the-art methods on three public CXR datasets (JSRT, Montgomery, and Shenzhen). Experiments in each dataset demonstrate the superior performance of FMN-UNet against baselines.
Open Access
Fractional fourier transform meets transformer encoder
(Institute of Electrical and Electronics Engineers, 2022-10-28) Şahinuç, Furkan; Koç, Aykut
Utilizing signal processing tools in deep learning models has been drawing increasing attention. Fourier transform (FT), one of the most popular signal processing tools, is employed in many deep learning models. Transformer-based sequential input processing models have also started to make use of FT. In the existing FNet model, it is shown that replacing the attention layer, which is computationally expensive, with FT accelerates model training without sacrificing task performances significantly. We further improve this idea by introducing the fractional Fourier transform (FrFT) into the transformer architecture. As a parameterized transform with a fraction order, FrFT provides an opportunity to access any intermediate domain between time and frequency and find better-performing transformation domains. According to the needs of downstream tasks, a suitable fractional order can be used in our proposed model FrFNet. Our experiments on downstream tasks show that FrFNet leads to performance improvements over the ordinary FNet.
Open Access
MRI reconstruction with conditional adversarial transformers
(Springer Cham, 2022-09-22) Korkmaz, Yılmaz; Özbey, Muzaffer; Çukur, Tolga; Haq, Nandinee; Johnson, Patricia; Maier, Andreas; Qin, Chen; Würfl, Tobias; Yoo, Jaejun
Deep learning has been successfully adopted for accelerated MRI reconstruction given its exceptional performance in inverse problems. Deep reconstruction models are commonly based on convolutional neural network (CNN) architectures that use compact input-invariant filters to capture static local features in data. While this inductive bias allows efficient model training on relatively small datasets, it also limits sensitivity to long-range context and compromises generalization performance. Transformers are a promising alternative that use broad-scale and input-adaptive filtering to improve contextual sensitivity and generalization. Yet, existing transformer architectures induce quadratic complexity and they often neglect the physical signal model. Here, we introduce a model-based transformer architecture (MoTran) for high-performance MRI reconstruction. MoTran is an adversarial architecture that unrolls transformer and data-consistency blocks in its generator. Cross-attention transformers are leveraged to maintain linear complexity in terms of the feature map size. Comprehensive experiments on MRI reconstruction tasks show that the proposed model improves the image quality over state-of-the-art CNN models.
Open Access
Novel deep learning algorithms for multi-modal medical image synthesis
(2023-08) Dalmaz, Onat
Multi-modal medical imaging is a powerful tool for diagnosis and treatment of various diseases, as it provides complementary information about tissue morphology and function. However, acquiring multiple images from different modalities or contrasts is often impractical or impossible due to various factors such as scan time, cost, and patient comfort. Medical image translation has emerged as a promising solution to synthesize target-modality images given source-modality images. Ability to synthesize unavailable images enhance the ubiquity and utility of multi-modal protocols while decreasing examination costs and toxicity exposure such as ionizing radiation and contrast agents. Existing medical image translation methods prominently rely on generative adversarial networks (GANs) with convolutional neural networks (CNNs) backbones. CNNs are designed to perform local processing with compact filters, and this inductive bias is prone to limited contextual sensitivity. Meanwhile, GANs suffer from limited sample fidelity and diversity due to one-shot sampling and implicit characterization of the image distribution. To overcome the challenges with CNN based GAN models, in this thesis, first ResViT was introduced that leverages novel aggregated residual transformer (ART) blocks that synergistically fuse representations from convolutional and transformer modules. Then SynDiff is introduced, a conditional diffusion model that progressively maps noise and source images onto the target image via large diffusion steps and adversarial projections, capturing a direct correlate of the image distribution and improving sample quality and speed. ResViT provides a unified implementation to avoid the need to rebuild separate synthesis models for varying source-target modality configurations, whereas SynDiff enables unsupervised training on unpaired datasets via a cycle-consistent architecture. ResViT and SynDiff was demonstrated on synthesizing missing sequences in multi-contrast MRI, and CT images from MRI, and their state-of-the-art performance in medical image translation was shown.
Open Access
ResViT: residual vision transformers for multimodal medical ımage synthesis
(Institute of Electrical and Electronics Engineers Inc., 2022-04-18) Dalmaz, Onat; Yurt, Mahmut; Çukur, Tolga
Generative adversarial models with convolutional neural network (CNN) backbones have recently been established as state-of-the-art in numerous medical image synthesis tasks. However, CNNs are designed to perform local processing with compact filters, and this inductive bias compromises learning of contextual features. Here, we propose a novel generative adversarial approach for medical image synthesis, ResViT, that leverages the contextual sensitivity of vision transformers along with the precision of convolution operators and realism of adversarial learning. ResViT’s generator employs a central bottleneck comprising novel aggregated residual transformer (ART) blocks that synergistically combine residual convolutional and transformer modules. Residual connections in ART blocks promote diversity in captured representations, while a channel compression module distills task-relevant information. A weight sharing strategy is introduced among ART blocks to mitigate computational burden. A unified implementation is introduced to avoid the need to rebuild separate synthesis models for varying source-target modality configurations. Comprehensive demonstrations are performed for synthesizing missing sequences in multi-contrast MRI, and CT images from MRI. Our results indicate superiority of ResViT against competing CNN- and transformer-based methods in terms of qualitative observations and quantitative metrics.
Open Access
TranSMS: transformers for super-resolution calibration in magnetic particle imaging
(Institute of Electrical and Electronics Engineers Inc., 2022-07-11) Gungor, Alper; Askin, Baris; Soydan, D.A.; Saritas, Emine Ulku; Top, C. B.; Çukur, Tolga
Magnetic particle imaging (MPI) offers exceptional contrast for magnetic nanoparticles (MNP) at high spatio-temporal resolution. A common procedure in MPI starts with a calibration scan to measure the system matrix (SM), which is then used to set up an inverse problem to reconstruct images of the MNP distribution during subsequent scans. This calibration enables the reconstruction to sensitively account for various system imperfections. Yet time-consuming SM measurements have to be repeated under notable changes in system properties. Here, we introduce a novel deep learning approach for accelerated MPI calibration based on Transformers for SM super-resolution (TranSMS). Low-resolution SM measurements are performed using large MNP samples for improved signal-to-noise ratio efficiency, and the high-resolution SM is super-resolved via model-based deep learning. TranSMS leverages a vision transformer module to capture contextual relationships in low-resolution input images, a dense convolutional module for localizing high-resolution image features, and a data-consistency module to ensure measurement fidelity. Demonstrations on simulated and experimental data indicate that TranSMS significantly improves SM recovery and MPI reconstruction for up to 64-fold acceleration in two-dimensional imaging