Scholarly Publications - Computer Engineering
Permanent URI for this collectionhttps://hdl.handle.net/11693/115582
Browse
Recent Submissions
Item Open Access Predicting the risk of death of cryptocurrencies(IEEE - Institute of Electrical and Electronics Engineers, 2023-07-27) Sakinoğlu, Bedirhan; Güvenir, AltayIn recent years, the attention drawn by cryptocurrencies has increased as their popularity grows rapidly. This situation attracts investors, entrepreneurs, regulators, and the general public. However, these coins may die and become dead coins. A coin is declared dead if no activity is recorded for more than one year. Numerous coins die without completing their one-year timeframe and this issue causes investors to lose a significant amount of money. In this study, we develop a deep neural network architecture based on long short-term memory (LSTM) to predict the death risk of a coin in a specified timeframe. In order to do this, time-series data consisting of the closing price and volume values of 4733 dead coins are utilized. The goal of our model is to inform investors about the death risk of the coin and improve their overall portfolio performance.Item Open Access Editorial: Machine learning, software process, and global software engineering(John Wiley and Sons Ltd, 2023-01-30) Steinmacher, I.; Clarke, P.; Tüzün, Eray; Britto, R.On June 26–28, 2020, the International Conference on Software and Systems Processes (ICSSP 2020) and the International Conference on Global Software Engineering (ICGSE 2020) were held in virtual settings during the first year of the COVID pandemic. Several submissions to the joint event have been selected for inclusion in this special issue, focusing on impactful and timely contributions to machine learning (ML). At present, many in our field are enthusiastic about the potential of ML, yet some risks should not be casually overlooked or summarily dismissed. Each ML implementation is subtly different from any other implementation, and the risk profile varies greatly based on the approach adopted and the implementation context. The ICSSP/ICGSE 2020 Program Committees have encouraged submissions that explore the risks and benefits associated with ML so that the important discussion regarding ML efficacy and advocacy can be further elaborated. Four contributions have been included in this special issue. © 2023 John Wiley & Sons, Ltd.Item Open Access Editorial: Best papers of the 14th international conference on software and system processes (ICSSP 2020) and 15th international conference on global software engineering (ICGSE 2020)(John Wiley and Sons Ltd, 2023-01-30) Steinmacher, I.; Clarke, P.; Tüzün, Eray; Britto, R.Today's software industry is global, virtual, and depending more than ever on strong and reliable processes. Stakeholders and infrastructure are distributed across the globe, posing challenges that go beyond those with co-located teams and servers. Software Engineering continues to be a complex undertaking, with projects challenged to meet expectations, especially regarding costs. We know that Software Engineering is an ever-changing discipline, with the result that firms and their employees must regularly embrace new methods, tools, technologies, and processes. In 2020, the International Conference on Global Software Engineering (ICGSE) and the International Conference on Systems and Software Processes (ICSSP) joined forces aiming to create a holistic understanding of the software landscape both from the perspective of human and infrastructure distribution and also the processes to support software development. Unfortunately, these challenges have become even more personal to many more in 2020 due to the disruption introduced by the COVID-19 pandemic, which forced both conferences to be held virtually. As an outcome of the joint event, we selected a set of the best papers from the two conferences, which were invited to submit extended versions to this Special Issue in the Journal of Software: Maintenance and Evolution. Dedicated committees were established to identify the best papers. Eight papers were invited and ultimately, seven of these invited papers have made it into this Special Issue. © 2023 John Wiley & Sons, Ltd.Item Open Access Automatic selection of compiler optimizations by machine learning(IEEE - Institute of Electrical and Electronics Engineers, 2023-08-28) Peker, Melih; Öztürk, Özcan; Yıldırım, S.; Uluyağmur Öztürk, M.Many widely used telecommunications applications have extremely long run times. Therefore, faster and more efficient execution of these codes on the same hardware is important in critical telecommunication applications such as base stations. Compilers greatly affect the properties of the executable program to be created. It is possible to change properties such as compilation speed, execution time, power consumption and code size using compiler flags. This study aims to find the set of flags that will provide the shortest run time among hundreds of compiler flag combinations in GCC using code flow analysis, loop analysis and machine learning methods without running the program.Item Open Access BLEND: A fast, memory-efficient and accurate mechanism to find fuzzy seed matches in genome analysis(Oxford University Press, 2023-01-10) Firtina, C.; Park, J.; Alser, M.; Kim, J. S.; Cali, D. S.; Shahroodi, T.; Ghiasi, N. M.; Singh, G.; Kanellopoulos, K.; Alkan, Can; Mutlu, O.Generating the hash values of short subsequences, called seeds, enables quickly identifying similarities between genomic sequences by matching seeds with a single lookup of their hash values. However, these hash values can be used only for finding exact-matching seeds as the conventional hashing methods assign distinct hash values for different seeds, including highly similar seeds. Finding only exact-matching seeds causes either (i) increasing the use of the costly sequence alignment or (ii) limited sensitivity. We introduce BLEND, the first efficient and accurate mechanism that can identify both exact-matching and highly similar seeds with a single lookup of their hash values, called fuzzy seed matches. BLEND (i) utilizes a technique called SimHash, that can generate the same hash value for similar sets, and (ii) provides the proper mechanisms for using seeds as sets with the SimHash technique to find fuzzy seed matches efficiently. We show the benefits of BLEND when used in read overlapping and read mapping. For read overlapping, BLEND is faster by 2.4×-83.9× (on average 19.3×), has a lower memory footprint by 0.9×-14.1× (on average 3.8×), and finds higher quality overlaps leading to accurate de novo assemblies than the state-of-the-art tool, minimap2. For read mapping, BLEND is faster by 0.8×-4.1× (on average 1.7×) than minimap2. Source code is available at https://github.com/CMU-SAFARI/BLEND. © 2023 The Author(s). Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.Item Open Access BFSig: leveraging file significance in bus factor estimation(Association for Computing Machinery, Inc., 2023) Haratian, Vahid; Evtikhiev, M.; Derakhshanfar, P.; Tüzün, Eray; Kovalenko, V.Software projects experience the departure of developers due to various reasons. As developers are one of the main sources of knowl edge in software projects, their absence will inevitably result in a certain degree of knowledge depletion. Bus Factor (BF) is a met ric to evaluate how this knowledge loss can affect the project’s continuityItem Open Access AyatDroid: a lightweight code cloning technique using different static features(IEEE - Institute of Electrical and Electronics Engineers, 2023-08-23) Glani, Y.; Ping, L.; Lin, K.; Shah, Syed AsadIn recent decades, malicious code reuse has surged in numbers and sophistication, it is a common practice among adversaries to reuse malicious code, which significantly threatens user privacy and security. Several signature-based code clone detection techniques have been proposed to detect malicious clones in Android applications that use the MD5 hash function to generate signatures. Meanwhile, these techniques only retrieve signatures from Java files. Due to the 128-bit signature size of the MD5 hash function, these techniques take longer to generate signatures. In this article, we propose the AyatDroid technique, which efficiently identifies malicious chunks by retrieving signatures from Java and manifest files. AyatDroid technique is tested on reliable CiCMalDroid 2020 dataset. We have evaluated the AyatDroid technique with other cutting-edge code clone detection techniques. Our experimental results demonstrated that AyatDroid outperformed regarding detection time and accuracy. AyatDroid is not only lightweight but also efficient, allowing it to be implemented on the large scale.Item Open Access DynED: dynamic ensemble diversification in data stream classification(Association for Computing Machinery, 2023-10-23) Abadifard, Soheil; Gheibuni, Sanaz; Bakhshi, Sepehr; Can, FazlıEnsemble methods are commonly used in classification due to their remarkable performance. Achieving high accuracy in a data stream environment is a challenging task considering disruptive changes in the data distribution, also known as concept drift. A greater diversity of ensemble components is known to enhance predic tion accuracy in such settings. Despite the diversity of components within an ensemble, not all contribute as expected to its overall performance. This necessitates a method for selecting components that exhibit high performance and diversity. We present a novel ensemble construction and maintenance approach based on MMR (Maximal Marginal Relevance) that dynamically combines the diver sity and prediction accuracy of components during the process of structuring an ensemble. The experimental results on both four real and 11 synthetic datasets demonstrate that the proposed approach (DynED) provides a higher average mean accuracy compared to the five state-of-the-art baselines.Item Open Access Offloading deep learning powered vision tasks from UAV to 5G edge server with denoising(Institute of Electrical and Electronics Engineers, 2023-06-20) Özer, S.; İlhan, H. E.; Özkanoğlu, Mehmet Akif; Çırpan, H. A.Offloading computationally heavy tasks from an unmanned aerial vehicle (UAV) to a remote server helps improve battery life and can help reduce resource requirements. Deep learning based state-of-the-art computer vision tasks, such as object segmentation and detection, are computationally heavy algorithms, requiring large memory and computing power. Many UAVs are using (pretrained) off-the-shelf versions of such algorithms. Offloading such power-hungry algorithms to a remote server could help UAVs save power significantly. However, deep learning based algorithms are susceptible to noise, and a wireless communication system, by its nature, introduces noise to the original signal. When the signal represents an image, noise affects the image. There has not been much work studying the effect of the noise introduced by the communication system on pretrained deep networks. In this work, we first analyze how reliable it is to offload deep learning based computer vision tasks (including both object segmentation and detection) by focusing on the effect of various parameters of a 5G wireless communication system on the transmitted image and demonstrate how the introduced noise of the used 5G system reduces the performance of the offloaded deep learning task. Then solutions are introduced to eliminate (or reduce) the negative effect of the noise. Proposed framework starts with introducing many classical techniques as alternative solutions, and then introduces a novel deep learning based solution to denoise the given noisy input image. The performance of various denoising algorithms on offloading both object segmentation and object detection tasks are compared. Our proposed deep transformer-based denoiser algorithm (NR-Net) yields state-of-the-art results in our experiments.Item Open Access Fine detailed texture learning for 3D meshes with generative models(Institute of Electrical and Electronics Engineers, 2023-11-03) Dündar, Ayşegül; Gao, J.; Tao, A.; Catanzaro, B.This paper presents a method to achieve fine detailed texture learning for 3D models that are reconstructed from both multi-view and single-view images. The framework is posed as an adaptation problem and is done progressively where in the first stage, we focus on learning accurate geometry, whereas in the second stage, we focus on learning the texture with a generative adversarial network. The contributions of the paper are in the generative learning pipeline where we propose two improvements. First, since the learned textures should be spatially aligned, we propose an attention mechanism that relies on the learnable positions of pixels. Second, since discriminator receives aligned texture maps, we augment its input with a learnable embedding which improves the feedback to the generator. We achieve significant improvements on multi-view sequences from Tripod dataset as well as on single-view image datasets, Pascal 3D+ and CUB. We demonstrate that our method achieves superior 3D textured models compared to the previous works.Item Open Access Image-to-image translation with disentangled latent vectors for face editing(Institute of Electrical and Electronics Engineers, 2023-08-24) Dalva, Y.; Pehlivan, H.; Hatipoglu, O. I.; Moran, C.; Dündar, AyşegülWe propose an image-to-image translation framework for facial attribute editing with disentangled interpretable latent directions. Facial attribute editing task faces the challenges of targeted attribute editing with controllable strength and disentanglement in the representations of attributes to preserve the other attributes during edits. For this goal, inspired by the latent space factorization works of fixed pretrained GANs, we design the attribute editing by latent space factorization, and for each attribute, we learn a linear direction that is orthogonal to the others. We train these directions with orthogonality constraints and disentanglement losses. To project images to semantically organized latent spaces, we set an encoder-decoder architecture with attention-based skip connections. We extensively compare with previous image translation algorithms and editing with pretrained GAN works. Our extensive experiments show that our method significantly improves over the state-of-the-arts.Item Open Access Benchmarking the robustness of instance segmentation models(Institute of Electrical and Electronics Engineers , 2023-08-29) Dalva, Y.; Pehlivan, H.; Altındiş, Said Fahri; Dündar, AyşegülThis article presents a comprehensive evaluation of instance segmentation models with respect to real-world image corruptions as well as out-of-domain image collections, e.g., images captured by a different set-up than the training dataset. The out-of-domain image evaluation shows the generalization capability of models, an essential aspect of real-world applica tions, and an extensively studied topic of domain adaptation. These presented robustness and generalization evaluations are important when designing instance segmentation models for real-world applications and picking an off-the-shelf pretrained model to directly use for the task at hand. Specifically, this benchmark study includes state-of-the-art network architectures, network backbones, normalization layers, models trained starting from scratch versus pretrained networks, and the effect of multitask training on robustness and generalization. Through this study, we gain several insights. For example, we find that group normalization (GN) enhances the robustness of networks across corruptions where the image contents stay the same but corruptions are added on top. On the other hand, batch normalization (BN) improves the generalization of the models across different datasets where statistics of image features change. We also find that single-stage detectors do not generalize well to larger image resolutions than their training size. On the other hand, multistage detectors can easily be used on images of different sizes. We hope that our comprehensive study will motivate the development of more robust and reliable instance segmentation models.Item Open Access Unknown face presentation attack detection via localized learning of multiple kernels(Institute of Electrical and Electronics Engineers, 2023-01-30) Arashloo, Shervin RahimzadehThe paper studies face spoofing, a.k.a. presentation attack detection (PAD) in the demanding scenarios of unknown attacks. While earlier studies have revealed the benefits of ensemble methods, and in particular, a multiple kernel learning (MKL) approach to the problem, one limitation of such techniques is that they treat the entire observation space similarly and ignore any variability and local structure inherent to the data. This work studies this aspect of face presentation attack detection with regards to one-class multiple kernel learning to benefit from the intrinsic local structure in bona fide samples to adaptively weight each representation in the composite kernel. More concretely, drawing on the one-class Fisher null formalism, we formulate a convex localised multiple kernel learning algorithm by regularising the collection of local kernel weights via a joint matrix-norm constraint and infer locally adaptive kernel weights for zero-shot one-class unseen attack detection. We present a theoretical study of the proposed localised MKL algorithm using Rademacher complexities to characterise its generalisation capability and demonstrate its advantages over some other options. An assessment of the proposed approach on general object image datasets illustrates its efficacy for anomaly and novelty detection while the results of the experiments on face PAD datasets verify its potential in detecting unknown/unseen face presentation attacks.Item Open Access One-class classification using ℓp-norm multiple kernel fisher null approach(Institute of Electrical and Electronics Engineers, 2023-03-14) Arashloo, Shervin RahimzadehWe address the one-class classification (OCC) problem and advocate a one-class MKL (multiple kernel learning) approach for this purpose. To this aim, based on the Fisher null-space OCC principle, we present a multiple kernel learning algorithm where an ℓp -norm regularisation ( p≥1 ) is considered for kernel weight learning. We cast the proposed one-class MKL problem as a min-max saddle point Lagrangian optimisation task and propose an efficient approach to optimise it. An extension of the proposed approach is also considered where several related one-class MKL tasks are learned concurrently by constraining them to share common weights for kernels. An extensive evaluation of the proposed MKL approach on a range of data sets from different application domains confirms its merits against the baseline and several other algorithms.Item Open Access Multi-label sentiment analysis on 100 languages with dynamic weighting for label imbalance(Institute of Electrical and Electronics Engineers Inc., 2023-01-01) Yılmaz, Selim Fırat; Kaynak, Ergün Batuhan; Koç, Aykut; Dibeklioğlu, Hamdi; Kozat, Süleyman SerdarWe investigate cross-lingual sentiment analysis, which has attracted significant attention due to its applications in various areas including market research, politics, and social sciences. In particular, we introduce a sentiment analysis framework in multi-label setting as it obeys Plutchik’s wheel of emotions. We introduce a novel dynamic weighting method that balances the contribution from each class during training, unlike previous static weighting methods that assign non-changing weights based on their class frequency. Moreover, we adapt the focal loss that favors harder instances from single-label object recognition literature to our multi-label setting. Furthermore, we derive a method to choose optimal class-specific thresholds that maximize the macro-f1 score in linear time complexity. Through an extensive set of experiments, we show that our method obtains the state-of-the-art performance in seven of nine metrics in three different languages using a single model compared with the common baselines and the best performing methods in the SemEval competition. We publicly share our code for our model, which can perform sentiment analysis in 100 languages, to facilitate further research.Item Open Access Scaling stratified stochastic gradient descent for distributed matrix completion(Institute of Electrical and Electronics Engineers, 2023-10-01) Abubaker, Nabil; Karsavuran, M. O.; Aykanat, CevdetStratified SGD (SSGD) is the primary approach for achieving serializable parallel SGD for matrix completion. State-of-the-art parallelizations of SSGD fail to scale due to large communication overhead. During an SGD epoch, these methods send data proportional to one of the dimensions of the rating matrix. We propose a framework for scalable SSGD through significantly reducing the communication overhead via exchanging point-to-point messages utilizing the sparsity of the rating matrix. We provide formulas to represent the essential communication for correctly performing parallel SSGD and we propose a dynamic programming algorithm for efficiently computing them to establish the point-to-point message schedules. This scheme, however, significantly increases the number of messages sent by a processor per epoch from O(K) to (K2) for a K-processor system which might limit the scalability. To remedy this, we propose a Hold-and-Combine strategy to limit the upper-bound on the number of messages sent per processor to O(KlgK). We also propose a hypergraph partitioning model that correctly encapsulates reducing the communication volume. Experimental results show that the framework successfully achieves a scalable distributed SSGD through significantly reducing the communication overhead. Our code is publicly available at: github.com/nfabubaker/CESSGDItem Open Access Automatic deceit detection through multimodal analysis of high-stake court-trials(Institute of Electrical and Electronics Engineers, 2023-10-05) Biçer, Berat; Dibeklioğlu, HamdiIn this article we propose the use of convolutional self-attention for attention-based representation learning, while replacing traditional vectorization methods with a transformer as the backbone of our speech model for transfer learning within our automatic deceit detection framework. This design performs a multimodal data analysis and applies fusion to merge visual, vocal, and speech(textual) channels; reporting deceit predictions. Our experimental results show that the proposed architecture improves the state-of-the-art on the popular Real-Life Trial (RLT) dataset in terms of correct classification rate. To further assess the generalizability of our design, we experiment on the low-stakes Box of Lies (BoL) dataset and achieve state-of-the-art performance as well as providing cross-corpus comparisons. Following our analysis, we report that (1) convolutional self-attention learns meaningful representations while performing joint attention computation for deception, (2) apparent deceptive intent is a continuous function of time and subjects can display varying levels of apparent deceptive intent throughout recordings, and (3), in support of criminal psychology findings, studying abnormal behavior out of context can be an unreliable way to predict deceptive intent.Item Open Access Load balanced locality-aware parallel SGD on multicore architectures for latent factor based collaborative filtering(Elsevier BV * North-Holland, 2023-04-20) Gülcan, Selçuk; Özdal, Muhammet Mustafa; Aykanat, CevdetWe investigate the parallelization of Stochastic Gradient Descent (SGD) for matrix completion on multicore architectures. We provide an experimental analysis of current SGD algorithms to find out their bottlenecks and limitations. Grid-based methods suffer from load imbalance among 2D blocks of the rating matrix, especially when datasets are skewed and sparse. Asynchronous methods, on the other hand, can face cache issues due to their memory access pattern. We propose bin-packing-based block balancing methods that are alternative to the recently proposed BaPa method. We then introduce Locality Aware SGD (LASGD), a grid-based asynchronous parallel SGD algorithm that efficiently utilizes cache by changing nonzero update sequence without affecting factor update order and carefully arranging latent factor matrices in the memory. Combined with our proposed load balancing methods, our experiments show that LASGD performs significantly better than alternative approaches in parallel shared-memory systems.Item Open Access Minimizing staleness and communication overhead in distributed SGD for collaborative filtering(IEEE Computer Society, 2023-09-06) Abubaker, Nabil; Caglayan, O.; Karsavuran, M. O.; Aykanat, CevdetDistributed asynchronous stochastic gradient descent (ASGD) algorithms that approximate low-rank matrix factorizations for collaborative filtering perform one or more synchronizations per epoch where staleness is reduced with more synchronizations. However, high number of synchronizations would prohibit the scalability of the algorithm. We propose a parallel ASGD algorithm, η-PASGD, for efficiently handling η synchronizations per epoch in a scalable fashion. The proposed algorithm puts an upper limit of KK on η, for a KK-processor system, such that performing Kη=K synchronizations per epoch would eliminate the staleness completely. The rating data used in collaborative filtering are usually represented as sparse matrices. The sparsity allows for reduction in the staleness and communication overhead combinatorially via intelligently distributing the data to processors. We analyze the staleness and the total volume incurred during an epoch of η-PASGD. Following this analysis, we propose a hypergraph partitioning model to encapsulate reducing staleness and volume while minimizing the maximum number of synchronizations required for a stale-free SGD. This encapsulation is achieved with a novel cutsize metric that is realized via a new recursive-bipartitioning-based algorithm. Experiments on up to 512 processors show the importance of the proposed partitioning method in improving staleness, volume, RMSE and parallel runtime.Item Open Access Memory-efficient boundary-preserving tetrahedralization of large three-dimensional meshes(Springer Science and Business Media Deutschland GmbH, 2023-05-09) Erkoç, Ziya; Güdükbay, Uğur; Si. H.We propose a divide-and-conquer algorithm to tetrahedralize three-dimensional meshes in a boundary-preserving fashion. It consists of three stages: Input Partitioning, Surface Closure, and Merge. We frst partition the input into several pieces to reduce the problem size. We apply 2D Triangulation to close the open boundaries to make new pieces watertight. Each piece is then sent to TetGen, a Delaunay-based tetrahedral mesh generator tool that forms the basis for our implementation. We fnally merge each tetrahedral mesh to calculate the fnal solution. In addition, we apply post-processing to remove the vertices we introduced during the input partitioning stage to preserve the input triangles. The beneft of our approach is that it can reduce peak memory usage or increase the speed of the process. It can even tetrahedralize meshes that TetGen cannot do due to the peak memory requirement.