Scholarly Publications - Computer Engineering

Permanent URI for this collectionhttps://hdl.handle.net/11693/115582

Browse

Now showing 1 - 20 of 1669

Open Access
Image segmentation algorithms for land categorization
(Taylor & Francis, 2024-01-01) Tilton, James C.; Aksoy, Selim; Tarabalka, Yuliya
The focus of this chapter is on image segmentation algorithms for land categorization. Our image analysis goal will generally be to appropriately partition an image obtained from a remote sensing instrument on-board a high flying aircraft or a satellite circling the earth or other planet. An example of an earth remote sensing application might be to produce a labeled map that divides the image into areas covered by distinct earth surface covers such as water, snow, types of natural vegetation, types of rock formations, types of agricultural crops and types of other man created development. Alternatively, one can segment the land based on climate (e.g., temperature, precipitation) and elevation zones. However, most image segmentation approaches do not directly provide such meaningful labels to image partitions. Instead, most approaches produce image partitions with generic labels such as region 1, region 2, and so on, which need to be converted into meaningful labels by a post-segmentation analysis.
Open Access
Quantum buffer design using Petri nets
(Taylor & Francis, 2024-12-10) Shah, Syed Asad; Oruc, A. Yavuz
This paper introduces a simplified quantum Petri net (QPN) model and uses this model to generalize classical SISO, SIMO, MISO, MIMO, and Priority buffers to their quantum counterparts. It provides a primitive storage element, namely a quantum S-R flip-flop and describes two different such flip-flop designs using quantum NOT, CNOT, CCNOT, and SWAP gates. Each of the quantum S-R flip-flops can be replicated to obtain a quantum register for any given number of qubits. The aforementioned quantum buffers are then obtained using the simplified QPN model and quantum registers. The quantum S-R flip-flop and quantum buffer designs have been tested using OpenQasm 2.0 and Qiskit programs on IBM quantum computers and simulators and the results validate their expected operations.
Open Access
Point cloud registration with quantile assignment
(Springer, 2024-03-19) Oğuz, Ecenur; Doğan, Yalım; Güdükbay, Uğur; Karaşan, Oya; Pınar, Mustafa
Point cloud registration is a fundamental problem in computer vision. The problem encompasses critical tasks such as feature estimation, correspondence matching, and transformation estimation. The point cloud registration problem can be cast as a quantile matching problem. We refined the quantile assignment algorithm by integrating prevalent feature descriptors and transformation estimation methods to enhance the correspondence between the source and target point clouds. We evaluated the performances of these descriptors and methods with our approach through controlled experiments on a dataset we constructed using well-known 3D models. This systematic investigation led us to identify the most suitable methods for complementing our approach. Subsequently, we devised a new end-to-end, coarse-to-fine pairwise point cloud registration framework. Finally, we tested our framework on indoor and outdoor benchmark datasets and compared our results with state-of-the-art point cloud registration methods.
Open Access
Universal lower bounds and optimal rates: achieving minimax clustering error in sub-exponential mixture models
(ML Research Press, 2024-07-03) Dreveton, Maximilien; Gözeten, Alperen; Grossglauser, Matthias; Thiran, Patrick; Agrawan S., Roth A.
Clustering is a pivotal challenge in unsupervised machine learning and is often investigated through the lens of mixture models. The optimal error rate for recovering cluster labels in Gaussian and sub-Gaussian mixture models involves ad hoc signal-to-noise ratios. Simple iterative algorithms, such as Lloyd's algorithm, attain this optimal error rate. In this paper, we first establish a universal lower bound for the error rate in clustering any mixture model, expressed through Chernoff information, a more versatile measure of model information than signal-to-noise ratios. We then demonstrate that iterative algorithms attain this lower bound in mixture models with sub-exponential tails, notably emphasizing location-scale mixtures featuring Laplace-distributed errors. Additionally, for datasets better modelled by Poisson or Negative Binomial mixtures, we study mixture models whose distributions belong to an exponential family. In such mixtures, we establish that Bregman hard clustering, a variant of Lloyd's algorithm employing a Bregman divergence, is rate optimal.
Open Access
Warping the residuals for image editing with StyleGAN
(Springer New York LLC, 2024-11-18) Yıldırım, Ahmet Burak; Pehlivan, Hamza; Dündar, Ayşegül
StyleGAN models show editing capabilities via their semantically interpretable latent organizations which require successful GAN inversion methods to edit real images. Many works have been proposed for inverting images into StyleGAN's latent space. However, their results either suffer from low fidelity to the input image or poor editing qualities, especially for edits that require large transformations. That is because low bit rate latent spaces lose many image details due to the information bottleneck even though it provides an editable space. On the other hand, higher bit rate latent spaces can pass all the image details to StyleGAN for perfect reconstruction of images but suffer from low editing qualities. In this work, we present a novel image inversion architecture that extracts high-rate latent features and includes a flow estimation module to warp these features to adapt them to edits. This is because edits often involve spatial changes in the image, such as adjustments to pose or smile. Thus, high-rate latent features must be accurately repositioned to match their new locations in the edited image space. We achieve this by employing flow estimation to determine the necessary spatial adjustments, followed by warping the features to align them correctly in the edited image. Specifically, we estimate the flows from StyleGAN features of edited and unedited latent codes. By estimating the high-rate features and warping them for edits, we achieve both high-fidelity to the input image and high-quality edits. We run extensive experiments and compare our method with state-of-the-art inversion methods. Qualitative metrics and visual comparisons show significant improvements.
Embargo
Balancing efficiency vs. effectiveness and providing missing label robustness in multi-label stream classification
(Elsevier BV, 2024-04-08) Bakhshi, Sepehr; Can, Fazlı
Available works addressing multi-label classification in a data stream environment focus on proposing accurate prediction models; however, they struggle to balance effectiveness and efficiency. In this work, we present a neural network-based approach that tackles this issue and is suitable for high-dimensional multi-label classification. The proposed model uses a selective concept drift adaptation mechanism that makes it well-suited for a non-stationary environment. We adapt the model to an environment with missing labels using a simple imputation strategy and demonstrate that it outperforms a vast majority of the supervised models. To achieve these, a weighted binary relevance-based approach named ML-BELS is introduced. To capture label dependencies, instead of a chain of stacked classifiers, the proposed model employs independent weighted ensembles as binary classifiers, with the weights generated by the predictions of a BELS classifier. We present an extensive assessment of the proposed model using 11 prominent baselines, five synthetic, and 13 real-world datasets, all with different characteristics. The results demonstrate that the proposed approach ML-BELS is successful in balancing effectiveness and efficiency, and is robust to missing labels and concept drift.
Open Access
Visualization of large Non-trivially partitioned unstructured data with native distribution on high-performance computing systems
(IEEE, 2024-01-15) Sahistan, Alper; Demirci, Serkan; Wald, Ingo; Zellmann, Stefan; Barbosa, João; Morrical, Nate; Güdükbay, Uğur
Interactively visualizing large finite element simulation data on High-Performance Computing (HPC) systems poses several difficulties. Some of these relate to unstructured data, which, even on a single node, is much more expensive to render compared to structured volume data. Worse yet, in the data parallel rendering context, such data with highly non-convex spatial domain boundaries will cause rays along its silhouette to enter and leave a given rank's domains at different distances. This straddling, in turn, poses challenges for both ray marching, which usually assumes successive elements to share a face, and compositing, which usually assumes a single fragment per pixel per rank. We holistically address these issues using a combination of three inter-operating techniques: first, we use a highly optimized GPU ray marching technique that, given an entry point, can march a ray to its exit point with highperformance by exploiting an exclusive-or (XOR) based compaction scheme. Second, we use hardware-accelerated ray tracing to efficiently find the proper entry points for these marching operations. Third, we use a “deep” compositing scheme to properly handle cases where different ranks' ray segments interleave in depth. We use GPU-to-GPU remote direct memory access (RDMA) to achieve interactive frame rates of 10-15 frames per second and higher for our motivating use case, the Fun3D NASA Mars Lander.
Embargo
Do code reviews lead to fewer code smells?
(Elsevier Inc., 2024-09) Tuna, Erdem; Seaman, Carolyn; Tüzün, Eray
**Context:** The code review process is conducted by software teams with various motivations. Among other goals, code reviews act as a gatekeeper for software quality. **Objective:** In this study, we explore whether code reviews have an impact on one specific aspect of software quality, software maintainability. We further extend our investigation by analyzing whether code review process quality (as evidenced by the presence of code review process smells) influences software maintainability (as evidenced by the presence of code smells). **Method:** We investigate whether smells in the code review process are related to smells in the code that was reviewed by using correlation analysis. We augment our quantitative analysis with a focus group study to learn practitioners’ opinions. **Results:** Our investigations revealed that the level of code smells neither increases nor decreases in 8 out of 10 code reviews, regardless of the quality of the code review. Contrary to our own intuition and that of the practitioners in our focus groups, we found that code review process smells have little to no correlation with the level of code smells. We identified multiple potential reasons behind the counter-intuitive results based on our focus group data. Furthermore, practitioners still believe that code reviews are helpful in improving software maintainability. **Conclusion:** Our results imply that the community should update our goals for code review practices and reevaluate those practices to align them with more relevant and modern realities.
Open Access
Progressive learning of 3D reconstruction network from 2D GAN data
(IEEE, 2024-02) Dündar, Ayşegül; Gao, Jun; Tao, Andrew; Catanzaro, Bryan
This paper presents a method to reconstruct high-quality textured 3D models from single images. Current methods rely on datasets with expensive annotations; multi-view images and their camera parameters. Our method relies on GAN generated multi-view image datasets which have a negligible annotation cost. However, they are not strictly multi-view consistent and sometimes GANs output distorted images. This results in degraded reconstruction qualities. In this work, to overcome these limitations of generated datasets, we have two main contributions which lead us to achieve state-of-the-art results on challenging objects: 1) A robust multi-stage learning scheme that gradually relies more on the models own predictions when calculating losses and 2) A novel adversarial learning pipeline with online pseudo-ground truth generations to achieve fine details. Our work provides a bridge from 2D supervisions of GAN models to 3D reconstruction models and removes the expensive annotation efforts. We show significant improvements over previous methods whether they were trained on GAN generated multi-view images or on real images with expensive annotations.
Open Access
Refining 3D human texture estimation from a single image
(IEEE, 2024-12) Altındiş, Said Fahri; Meric, Adil; Dalva, Yusuf; Güdükbay, Uğur; Dündar, Ayşegül
Estimating 3D human texture from a single image is essential in graphics and vision. It requires learning a mapping function from input images of humans with diverse poses into the parametric (uv) space and reasonably hallucinating invisible parts. To achieve a high-quality 3D human texture estimation, we propose a framework that adaptively samples the input by a deformable convolution where offsets are learned via a deep neural network. Additionally, we describe a novel cycle consistency loss that improves view generalization. We further propose to train our framework with an uncertainty-based pixel-level image reconstruction loss, which enhances color fidelity. We compare our method against the state-of-the-art approaches and show significant qualitative and quantitative improvements.
Open Access
Taxonomy of inline code comment smells
(Springer, 2024-04-03) Jabrayilzade, Elgun; Yurtoğlu, Ayda; Tüzün, Eray
Code comments play a vital role in source code comprehension and software maintainability. It is common for developers to write comments to explain a code snippet, and commenting code is generally considered a good practice in software engineering. However, low-quality comments can have a detrimental effect on software quality or be ineffective for code understanding. This study aims to create a taxonomy of inline code comment smells and determine how frequently each smell type occurs in software projects. We conducted a multivocal literature review to define the initial taxonomy of inline comment smells. Afterward, we manually labeled 2447 inline comments from eight open-source projects where half of them were Java, and another half were Python projects. We created a taxonomy of 11 inline code comment smell types and found out that the smells exist in both Java and Python projects with varying degrees. Moreover, we conducted an online survey with 41 software practitioners to learn their opinions on these smells and their impact on code comprehension and software maintainability. The survey respondents generally agreed with the taxonomy; however, they reported that some smell types might have a positive effect on code comprehension in certain scenarios. We also opened pull requests and issues fixing the comment smells in the sampled projects, where we got a 27% acceptance rate. We share our manually labeled dataset online and provide implications for software engineering practitioners, researchers, and educators.
Open Access
Application scheduling with multiplexed sensing of monitoring points in multi-purpose IoT wireless sensor networks
(IEEE, 2024-02) Çavdar, Mustafa Can; Körpeoğlu, İbrahim; Ulusoy, Özgür
Wireless sensor networks (WSNs) play a crucial role in Internet-of-Things (IoT) systems serving a variety of applications. They gather data from specific sensor nodes and transmit it to remote units for processing. When multiple applications share a WSN infrastructure, efficient scheduling becomes vital. In our research, we address the problem of application scheduling in WSNs. Specifically, we focus on scenarios where applications request data from monitoring points within the coverage area of a WSN. We propose a shared-data approach that reduces the network’s sensing and communication load by allowing multiple applications to use the same sensing data. To tackle the scheduling challenge, we introduce a genetic algorithm named GABAS and three greedy algorithms: LMPF, LMSF, and LTSF. These algorithms determine the order in which applications are admitted to the WSN infrastructure, considering various criteria. To assess the performance of our algorithms, we conducted extensive simulation experiments and compared them with standard scheduling methods. We also evaluated the performance of GABAS as compared to another genetic scheduling algorithm that has recently appeared in the literature. The overall experimental results show that the methods we propose outperform the compared approaches across various metrics, namely makespan, turnaround time, waiting time, and successful execution rate. In particular, our genetic algorithm proves to be highly effective in scheduling applications and optimizing the mentioned metrics.
Open Access
GateKeeper-GPU: fast and accurate pre-alignment filtrering in short read mapping
(IEEE, 2024-05) Bingol, Zülal; Alser, Mohammed; Mutlu, Onur; Öztürk, Özcan; Alkan, Can
At the last step of short read mapping, the candidate locations of the reads on the reference genome are verified to compute their differences from the corresponding reference segments using sequence alignment algorithms. Calculating the similarities and differences between two sequences is still computationally expensive since approximate string matching techniques traditionally inherit dynamic programming algorithms with quadratic time and space complexity. We introduce GateKeeper-GPU, a fast and accurate pre-alignment filter that efficiently reduces the need for expensive sequence alignment. GateKeeper-GPU provides two main contributions: first, improving the filtering accuracy of GateKeeper (a lightweight pre-alignment filter), and second, exploiting the massive parallelism provided by the large number of GPU threads of modern GPUs to examine numerous sequence pairs rapidly and concurrently. By reducing the work, GateKeeper-GPU provides an acceleration of 2.9$\boldsymbol{\times}$ to sequence alignment and up to $1.4\boldsymbol{\times}$ speedup to the end-to-end execution time of a comprehensive read mapper (mrFAST). GateKeeper-GPU is available at https://github.com/BilkentCompGen/GateKeeper-GPU
Open Access
Cognitive activity detection and tracing system
(KARE Publishing, Kare Yayıncılık, 2023-08-01) Yıldırım, Onur; Kandemir, Çağla; Kardaşlar, Emre; Sümer, Emre
Cognitive problems like Dementia and Alzheimer’s are usually challenging to diagnose but can be noticed by some signs of their symptoms. The most common symptoms are confusion, trouble finding the right word, memory loss, and difficulty concentrating. This study aims to design a cognitive activity detection and tracing system that contains games and analyzes users’ performances then displays detailed statistics to the users. The proposed Cognitive Activity Detection and Tracing System (CADTS) is software that contains different kinds of games from different categories inside its body that aims to measure cognitive activity by utilizing formulations in the context of the games and give feedback to users concerning the performance analyses done. The purpose of these analyses is to catch the signs of symptoms. An insight into a possible scoring system is provided, and as our results, several descriptive statistics are shared based on the tests conducted.
Open Access
Mide22: an annotated multi-event tweet dataset for misinformation detection
(European Language Resources Association (ELRA), 2024) Toraman, Çağrı; Özçelik, Oğuzhan; Şahinuç, Furkan; Can, Fazlı
The rapid dissemination of misinformation through online social networks poses a pressing issue with harmful consequences jeopardizing human health, public safety, democracy, and the economy; therefore, urgent action is required to address this problem. In this study, we construct a new human-annotated dataset, called MiDe22, having 5,284 English and 5,064 Turkish tweets with their misinformation labels for several recent events between 2020 and 2022, including the Russia-Ukraine war, COVID-19 pandemic, and Refugees. The dataset includes user engagements with the tweets in terms of likes, replies, retweets, and quotes. We also provide a detailed data analysis with descriptive statistics and the experimental results of a benchmark evaluation for misinformation detection. © 2024 ELRA Language Resource Association: CC BY-NC 4.0.
Open Access
An empirical analysis of issue templates usage in large-scale projects on github
(Association for Computing Machinery (ACM), 2024) Sülün, Emre; Saçakcı, Metehan; Tüzün, Eray
GitHub Issues is a widely used issue tracking tool in open-source software projects. Originally designed with broad flexibility, its lack of standardization led to incomplete issue reports, impeding software development and maintenance efficiency. To counteract this, GitHub introduced issue templates in 2016, which rapidly became popular. Our study assesses the current use and evolution of these templates in large-scale open-source projects and their impact on issue tracking metrics, including resolution time, number of reopens, and number of issue comments. Employing a comprehensive analysis of 350 templates from 100 projects, we also evaluated over 1.9 million issues for template conformity and impact. Additionally, we solicited insights from open-source software maintainers through a survey. Our findings highlight issue templates’ extensive usage in 99 of the 100 surveyed projects, with a growing preference for YAML-based templates, a more structured template variant. Projects with a template exhibited markedly reduced resolution time (381.02 days to 103.18 days) and reduced issue comment count (4.95 to 4.32) compared to those without. The use of YAML-based templates further significantly decreased resolution time, the number of reopenings, and the discussion extent. Thus, our research underscores issue templates’ positive impact on large-scale open-source projects, offering recommendations for improved effectiveness.
Open Access
A novel neural ensemble architecture for on-the-fly classification of evolving text streams
(Association for Computing Machinery (ACM) , 2024) Ghahramanian, Pouya; Bakhshi, Sepehr; Bonab, Hamed; Can, Fazlı
We study on-the-fly classification of evolving text streams in which the relation between the input data and target labels changes over time-i.e., "concept drift." These variations decrease the model's performance, as predictions become less accurate over time and they necessitate a more adaptable system. While most studies focus on concept drift detection and handling with ensemble approaches, the application of neural models in this area is relatively less studied. We introduce Adaptive Neural Ensemble Network (AdaNEN), a novel ensemble-based neural approach, capable of handling concept drift in data streams. With our novel architecture, we address some of the problems neural models face when exploited for online adaptive learning environments. Most current studies address concept drift detection and handling in numerical streams, and the evolving text stream classification remains relatively unexplored. We hypothesize that the lack of public and large-scale experimental data could be one reason. To this end, we propose a method based on an existing approach for generating evolving text streams by introducing various types of concept drifts to real-world text datasets. We provide an extensive evaluation of our proposed approach using 12 state-of-the-art baselines and 13 datasets. We first evaluate concept drift handling capability of AdaNEN and the baseline models on evolving numerical streams; this aims to demonstrate the concept drift handling capabilities of our method on a general spectrum and motivate its use in evolving text streams. The models are then evaluated in evolving text stream classification. Our experimental results show that AdaNEN consistently outperforms the existing approaches in terms of predictive performance with conservative efficiency.
Open Access
ICMI 2024 chairs welcome
(Association for Computing Machinery, 2024) Hung, Hayley; Oertel, Catharine; Soleymani, Mohammad; Chaspari, Theodora; Dibeklioğlu, Hamdi; Shukla, Jainendra; Truong, Khiet
Embargo
Large-margin multiple kernel ℓp-SVDD using Frank–Wolfe algorithm for novelty detection
(Elsevier BV, 2023-12-09) Rahimzadeh Arashloo, Shervin
Using a variable 𝓁𝑝≥1-norm penalty on the slacks, the recently introduced 𝓁𝑝-norm Support Vector Data Description (𝓁𝑝-SVDD) method has improved the performance in novelty detection over the baseline approach, sometimes remarkably. This work extends this modelling formalism in multiple aspects. First, a large-margin extension of the 𝓁𝑝-SVDD method is formulated to enhance generalisation capability by maximising the margin between the positive and negative samples. Second, based on the Frank–Wolfe algorithm, an efficient yet effective method with predictable accuracy is presented to optimise the convex objective function in the proposed method. Finally, it is illustrated that the proposed approach can effectively benefit from a multiple kernel learning scheme to achieve state-of-the-art performance. The proposed method is theoretically analysed using Rademacher complexities to link its classification error probability to the margin and experimentally evaluated on several datasets to demonstrate its merits against existing methods.
Open Access
Robust one-class classification using deep kernel spectral regression
(ELSEVIER, 2024-03-07) Mohammad, Salman; Arashloo, Shervin Rahimzadeh
The existing one-class classification (OCC) methods typically presume the existence of a pure target training set and generally face difficulties when the training set is contaminated with non-target objects. This work addresses this aspect of the OCC problem and formulates an effective method that leverages the advantages of kernel-based methods to achieve robustness against training label noise while enabling direct deep learning of features from the data to optimise a Fisher-based loss function in the Hilbert space. As such, the proposed OCC approach can be trained in an end-to-end fashion while, by virtue of a Tikhonov regularisation in the Hilbert space, it provides high robustness against the training set contamination. Extensive experiments conducted on multiple datasets in different application scenarios demonstrate that the proposed methodology is robust and performs better than the state-of-the-art algorithms for OCC when the training set is corrupted by contamination.