Improving the performance of quantized transformers with graph neural networks
Date
Authors
Editor(s)
Advisor
Supervisor
Co-Advisor
Co-Supervisor
Instructor
BUIR Usage Stats
views
downloads
Series
Abstract
Transformers have become established models in natural language processing (NLP) tasks. Their representational capabilities improve with size, but training and hosting larger models is computationally demanding. The rise in computational overhead leads to an increase in the carbon footprint, raising concerns about the environmental impacts of using these models. Parameter quantization promises to reduce the utilization costs of transformers; however, low-bit quantization generally leads to a notable loss in model performance. To reduce the degradation in model performance due to quantization while maintaining its associated benefits, we introduce BitTransGNN, a novel framework that improves quantized transformer performance by integrating them with Graph Neural Networks (GNNs). Transformers excel in capturing local contextual semantics, while GNNs are competent in representing global structural relationships within data. BitTransGNN makes use of this complementary nature of the two models to improve the representational capabilities of quantized transformers. After presenting our proposed architecture, to extend the utility of BitTransGNN to inductive settings, we then introduce variants of BitTransGNN that encapsulate the knowledge learned by BitTransGNN within a solitary quantized transformer model. Through an extensive set of experiments, we show that BitTransGNN substantially reduces the performance gap between quantized transformers and their full-precision counterparts while retaining the efficiency advantages provided by quantization. Transductive BitTransGNN variants outperform quantized transformer baselines by up to 21% while introducing minimal additional overhead. Inductive BitTransGNN variants improve quantized transformer performance by up to 19% with zero additional inference costs. To evaluate the cost-performance tradeoff, we inspect the model performance and utilization costs of BitTransGNN and the baseline models. We perform further analyses on BitTransGNN outputs to validate the premise that transformers and GNNs focus on highly distinct features, examine the significance of different BitTransGNN components, and discuss potential limitations. The results and findings presented in this thesis contribute to the research on improving the efficiency of neural networks and offer a new perspective on reducing neural model costs without making important sacrifices from model performance.