Browsing by Subject "Logistic regression"

Now showing 1 - 4 of 4

Open Access
Development and validation of methods for the diagnosis of lung cancer via serological biomarkers
(2019-02) Akçay, Abbas Güven
Over 10% of all new cancer cases are lung cancer. Moreover, estimates till 2030 indicate that already increasing lung cancer incidences will keep increasing, especially in developing countries like Turkey. Lung cancer, the leading cause of cancer deaths, has two large divisions: Small Cell Lung Cancer (SCLC) and Non-Small Cell Lung Cancer (NSCLC). SCLC is the most aggressive subtype of lung cancer. And although, the treatment options and median survival time is more favorable in Limited Disease (LD), high tumor growth rate and metastatic tendency of SCLC even in the early stages, makes the diagnosis troublesome. Similarly, if NSCLC is diagnosed in early stages, surgery option is open and this increases the patient survival rate. However, current methods in screening and diagnosis, such as computed tomography (CT) and positron emission tomography (PET), are all limited by false positivity rates. Additionally, biopsy methods used in histological evaluations are both invasive and prone to false negativity. Therefore, new diagnostic tools which are cheap, accurate and non-invasive are in high demand. Autologous antibodies are abundantly elicited and stably exist in patient sera years before the clinical diagnosis of disease. Several such antibodies were reported by our group and other groups in lung cancer. Therefore, new diagnostic methods incorporating autologous antibodies can be a huge step forward in early diagnosis of lung cancer. Moreover, miRNAs, with their unique hormone like features such as circulation in serum and their regulatory effects in cell, are another good candidate for the early diagnosis of lung cancer. Therefore, in this study I aimed to develop a reliable, robust and automated evaluation method to re-evaluate custom Protein Array (cPA) screenings previously performed in our lab, and to determine the autologous antibodies with highest discriminatory power between SCLC patients & healthy controls. Moreover, I aimed to develop a Quartz Crystal Microbalance with Dissipation (QCM-D) based immunoassay to be incorporated later in the validation of cPA results. Lastly, in a parallel study I aimed to identify and validate novel miRNA biomarkers NSCLC. My results indicate that cPAs can have better sensitivity and specificity than ELISA and that QCM-D can be developed as an alternative to ELISA. miRNAs identified in silico, can also be validated ex vivo. Previously, Protein Arrays (PAs) and cPAs were screened using 49 SCLC patient’s and 50 healthy serums in our laboratory, incorporating visual and manual evaluations. Sensitivity and specificity values were calculated for individual autologous-antibodies and a number of autologous-antibody panels. Moreover, validations of cPA results were carried via ELISA. However, large discrepancies between cPA and ELISA results, as well as inconsistencies among ELISA results urged me to consider re-evaluation of cPA results with a more robust way, and to focus on developing a method superior to ELISA in autologous-antibody evaluations. Therefore, I incorporated AIDA to generate numeric values out of cPA screening images and filtered low quality data with optimized cut-off values. Several Receiver Operating Characteristic (ROC) curves were plotted using evaluated data. Improved results were evident by the increased Area Under Curve (AUC) values in both individual and combined ROC curves. Moreover, I developed a QCM based immunosensor for detection of anti-SOX2 antibody to be incorporated later in validation of cPA results. Binding interaction between anti-SOX2 antibody and SOX2 protein was modelled using 1:1 Langmuir Isothermal Binding and standard curves generated in QCM. In a parallel study, I also investigated miRNAs significantly upregulated in NSCLC when compared to high risk controls. For that purpose, miRNA expression datasets were gathered from GEO. Selected 2 datasets with the same sample type were analyzed for common significantly upregulated miRNAs among these two datasets. Significantly upregulated miRNAs were subjected to logistic regression analysis with LASSO regularization (error metrics: AUC and MSE) to select best panel of miRNAs that can distinguish NSCLC patients from healthy controls in given datasets. Moreover, selected miRNAs were analyzed with qRT-PCR to validate the panel. I was able to re-evaluate cPA results by eliminating low quality data from numeric values generated via AIDA software from cPA images. I identified a panel of 4 autologous antibodies (FKBP8 – P53 – SOX2 – POLB) which resulted in 60% sensitivity at 100% specificity in discrimination of SCLC from controls. ROC of this autologous antibody panel had an AUC of 95.04%. Given panel surpassed diagnostic power of the only commercially available diagnostic kit of the same kind; EarlyCDT-Lung. Moreover, proof of concept for measurements of anti-protein antibodies were carried successfully in QCM, using anti-SOX2 antibody-SOX2 protein pair in PBS buffer as an example for it. Early results of anti-SOX2 mAb QCM indicate a linear assay range comparable to ELISA. Langmuir Isothermal Binding model revealed a strong interaction between antibody and protein in our QCM anti-SOX2 measurement experiments. Lastly, I was able to select 5 miRNAs using logistic regression and LASSO regularization that can best discriminate between NSCLC patients and high risk controls. However, validation experiments using qRT-PCR needs to be repeated as low Ct values and prominent hemolysis in serum samples prevented drawing meaningful conclusions.
Open Access
Online nonlinear modeling for big data applications
(2017-12) Khan, Farhan
We investigate online nonlinear learning for several real life, adaptive signal processing and machine learning applications involving big data, and introduce algorithms that are both e cient and e ective. We present novel solutions for learning from the data that is generated at high speed and/or have big dimensions in a non-stationary environment, and needs to be processed on the y. We speci cally focus on investigating the problems arising from adverse real life conditions in a big data perspective. We propose online algorithms that are robust against the non-stationarities and corruptions in the data. We emphasize that our proposed algorithms are universally applicable to several real life applications regardless of the complexities involving high dimensionality, time varying statistics, data structures and abrupt changes. To this end, we introduce a highly robust hierarchical trees algorithm for online nonlinear learning in a high dimensional setting where the data lies on a time varying manifold. We escape the curse of dimensionality by tracking the subspace of the underlying manifold and use the projections of the original high dimensional regressor space onto the underlying manifold as the modi ed regressor vectors for modeling of the nonlinear system. By using the proposed algorithm, we reduce the computational complexity to the order of the depth of the tree and the memory requirement to only linear in the intrinsic dimension of the manifold. We demonstrate the signi cant performance gains in terms of mean square error over the other state of the art techniques through simulated as well as real data. We then consider real life applications of online nonlinear learning modeling, such as network intrusions detection, customers' churn analysis and channel estimation for underwater acoustic communication. We propose sequential and online learning methods that achieve signi cant performance in terms of detection accuracy, compared to the state-of-the-art techniques. We speci cally introduce structured and deep learning methods to develop robust learning algorithms. Furthermore, we improve the performance of our proposed online nonlinear learning models by introducing mixture-of-experts methods and the concept of boosting. The proposed algorithms achieve signi cant performance gain over the state-ofthe- art methods with signi cantly reduced computational complexity and storage requirements in real life conditions.
Open Access
Prediction of cryptocurrency returns using machine learning
(Springer, 2021-02) Akyildirim, E.; Goncu, A.; Sensoy, Ahmet
In this study, the predictability of the most liquid twelve cryptocurrencies are analyzed at the daily and minute level frequencies using the machine learning classification algorithms including the support vector machines, logistic regression, artificial neural networks, and random forests with the past price information and technical indicators as model features. The average classification accuracy of four algorithms are consistently all above the 50% threshold for all cryptocurrencies and for all the timescales showing that there exists predictability of trends in prices to a certain degree in the cryptocurrency markets. Machine learning classification algorithms reach about 55–65% predictive accuracy on average at the daily or minute level frequencies, while the support vector machines demonstrate the best and consistent results in terms of predictive accuracy compared to the logistic regression, artificial neural networks and random forest classification algorithms.
Open Access
Predictive modeling of return occurrence in e-commerce apparel market: a comparative study of logistic regression, LASSO, XGBoost and random forest techniques
(2024-05) Kutlu, Asiye Aslı
This study focuses on the development of a predictive model for return occurrence in the apparel segment of an e-commerce company based in Turkey. Leveraging data provided by the company, the research employs various machine learning techniques to explore the impact of various factors on return. Models are developed, incorporating predictor variables related to product, supplier, customer and shopping information with the final model also including interaction of these variables. LASSO is applied to simplify the final model and select the most relevant variables. Performance metrics; AUC score, accuracy, precision, and recall are evaluated for the models, with comparisons made between logistic regression, LASSO, XGBoost, and Random Forest. Findings indicate that logistic regression models outperform XGBoost and Random Forest in terms of AUC score.