ABSTRACT
Graph Neural Network (GNN) training and inference involve significant challenges of scalability with respect to both model sizes and number of layers, resulting in degradation of efficiency and accuracy for large and deep GNNs. We present an end-to-end solution that aims to address these challenges for efficient GNNs in resource constrained environments while avoiding the oversmoothing problem in deep GNNs. We introduce a quantization based approach for all stages of GNNs, from message passing in training to node classification, compressing the model and enabling efficient processing. The proposed GNN quantizer learns quantization ranges and reduces the model size with comparable accuracy even under low-bit quantization. To scale with the number of layers, we devise a message propagation mechanism in training that controls layer-wise changes of similarities between neighboring nodes. This objective is incorporated into a Lagrangian function with constraints and a differential multiplier method is utilized to iteratively find optimal embeddings. This mitigates oversmoothing and suppresses the quantization error to a bound. Significant improvements are demonstrated over state-of-the-art quantization methods and deep GNN approaches in both full-precision and quantized models. The proposed quantizer demonstrates superior performance in INT2 configurations across all stages of GNN, achieving a notable level of accuracy. In contrast, existing quantization approaches fail to generate satisfactory accuracy levels. Finally, the inference with INT2 and INT4 representations exhibits a speedup of 5.11 × and 4.70 × compared to full precision counterparts, respectively.
- Mehdi Bahri, Gaétan Bahl, and Stefanos Zafeiriou. 2021. Binary graph neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9492--9501.Google ScholarCross Ref
- Neil Band. 2020. MemFlow: Memory-Aware Distributed Deep Learning. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2883--2885.Google ScholarDigital Library
- Ron Banner, Itay Hubara, Elad Hoffer, and Daniel Soudry. 2018. Scalable Methods for 8-Bit Training of Neural Networks. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (Montréal, Canada) (NIPS'18). Curran Associates Inc., Red Hook, NY, USA, 5151--5159.Google ScholarDigital Library
- Ron Banner, Yury Nahshan, and Daniel Soudry. 2019. Post Training 4-Bit Quantization of Convolutional Networks for Rapid-Deployment. Curran Associates Inc., Red Hook, NY, USA.Google Scholar
- Yoshua Bengio, Nicholas Léonard, and Aaron Courville. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013).Google Scholar
- S. Berchtold, C. Bohm, H.V. Jagadish, H.-P. Kriegel, and J. Sander. 2000. Independent quantization: an index compression technique for high-dimensional data spaces. In Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073). 577--588. https://doi.org/10.1109/ICDE.2000.839456Google ScholarCross Ref
- Shaosheng Cao, Wei Lu, and Qiongkai Xu. 2016. Deep Neural Networks for Learning Graph Representations. Proceedings of the AAAI Conference on Artificial Intelligence 30, 1 (Feb. 2016). https://ojs.aaai.org/index.php/AAAI/article/view/10179Google ScholarCross Ref
- Yanqing Chen, Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2013. The expressive power of word embeddings. arXiv preprint arXiv:1301.3226 (2013).Google Scholar
- Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. Training deep neural networks with low precision multiplications. arXiv:1412.7024 [cs.LG]Google Scholar
- Gunduz Vehbi Demirci, Aparajita Haldar, and Hakan Ferhatosmanoglu. 2022. Scalable Graph Convolutional Network Training on Distributed-Memory Systems. Proc. VLDB Endow. 16, 4 (dec 2022), 711--724. https://doi.org/10.14778/3574245.3574256Google ScholarDigital Library
- Mucong Ding, Kezhi Kong, Jingling Li, Chen Zhu, John Dickerson, Furong Huang, and Tom Goldstein. 2021. VQ-GNN: A universal framework to scale up graph neural networks using vector quantization. Advances in Neural Information Processing Systems 34 (2021), 6733--6746.Google Scholar
- Zhen Dong, Zhewei Yao, Amir Gholami, Michael W Mahoney, and Kurt Keutzer. 2019. Hawq: Hessian aware quantization of neural networks with mixed-precision. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 293--302.Google ScholarCross Ref
- Karima Echihabi. 2020. High-dimensional vector similarity search: from time series to deep network embeddings. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2829--2832.Google ScholarDigital Library
- Karima Echihabi, Kostas Zoumpatianos, and Themis Palpanas. 2021. New Trends in High-D Vector Similarity Search: Al-Driven, Progressive, and Distributed. Proc. VLDB Endow. 14, 12 (oct 2021), 3198--3201. https://doi.org/10.14778/3476311.3476407Google ScholarDigital Library
- Steven K Esser, Jeffrey L McKinstry, Deepika Bablani, Rathinakumar Appuswamy, and Dharmendra S Modha. 2019. LEARNED STEP SIZE QUANTIZATION. In International Conference on Learning Representations.Google Scholar
- Boyuan Feng, Yuke Wang, Xu Li, Shu Yang, Xueqiao Peng, and Yufei Ding. 2020. Sgquant: Squeezing the last bit on graph neural networks with specialized quantization. In 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, 1044--1052.Google ScholarCross Ref
- Hakan Ferhatosmanoglu, Ertem Tuncel, Divyakant Agrawal, and Amr El Abbadi. 2000. Vector approximation based indexing for non-uniform high dimensional data sets. In Proceedings of the 9th International Conference on Information and Knowledge Management (CIKM). 202--209.Google ScholarDigital Library
- Matthias Fey and Jan Eric Lenssen. 2019. Fast graph representation learning with PyTorch Geometric. arXiv preprint arXiv:1903.02428 (2019).Google Scholar
- Johannes Gasteiger, Aleksandar Bojchevski, and Stephan Günnemann. 2019. Combining Neural Networks with Personalized PageRank for Classification on Graphs. In International Conference on Learning Representations. https://openreview.net/forum?id=H1gL-2A9YmGoogle Scholar
- Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W Mahoney, and Kurt Keutzer. 2022. A survey of quantization methods for efficient neural network inference. In Low-Power Computer Vision. Chapman and Hall/CRC, 291--326.Google Scholar
- Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. 2017. Neural Message Passing for Quantum Chemistry. In Proceedings of the 34th International Conference on Machine Learning - Volume 70 (Sydney, NSW, Australia) (ICML'17). 1263--1272.Google Scholar
- Richard A Groeneveld and Glen Meeden. 1984. Measuring skewness and kurtosis. Journal of the Royal Statistical Society: Series D (The Statistician) 33, 4 (1984), 391--399.Google Scholar
- Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. Advances in neural information processing systems 30 (2017).Google Scholar
- Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
- Mikael Henaff, Joan Bruna, and Yann LeCun. 2015. Deep Convolutional Networks on Graph-Structured Data. arXiv:1506.05163 [cs.LG]Google Scholar
- Linyong Huang, Zhe Zhang, Zhaoyang Du, Shuangchen Li, Hongzhong Zheng, Yuan Xie, and Nianxiong Tan. 2022. EPQuant: A Graph Neural Network compression approach based on product quantization. Neurocomputing 503 (2022), 49--61.Google ScholarDigital Library
- Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations. J. Mach. Learn. Res. 18, 1 (jan 2017), 6869--6898.Google ScholarDigital Library
- Benoit Jacob et al . 2017. gemmlowp: a small self-contained low-precision GEMM library.(2017).Google Scholar
- Vadim Kantorov. 2020. Pack bool and other integer tensors into smaller bitwidth in PyTorch. https://gist.github.com/vadimkantorov/30ea6d278bc492abf6ad328c6965613aGoogle Scholar
- Thomas N. Kipf and Max Welling. 2016. Variational Graph Auto-Encoders. arXiv:1611.07308 [stat.ML]Google Scholar
- Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations. https://openreview.net/forum?id=SJU4ayYglGoogle Scholar
- Xiaofan Lin, Cong Zhao, and Wei Pan. 2017. Towards accurate binary convolutional neural network. Advances in neural information processing systems 30 (2017).Google Scholar
- Meng Liu, Hongyang Gao, and Shuiwang Ji. 2020. Towards deeper graph neural networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 338--348.Google ScholarDigital Library
- Xiaorui Liu, Wei Jin, Yao Ma, Yaxin Li, Hua Liu, Yiqi Wang, Ming Yan, and Jiliang Tang. 2021. Elastic graph neural networks. In International Conference on Machine Learning. PMLR, 6837--6849.Google Scholar
- Yao Ma, Xiaorui Liu, Tong Zhao, Yozen Liu, Jiliang Tang, and Neil Shah. 2021. A unified view on graph neural networks as graph signal denoising. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (CIKM). 1202--1211.Google ScholarDigital Library
- Jayanta Mondal and Amol Deshpande. 2012. Managing large dynamic graphs efficiently. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. 145--156.Google ScholarDigital Library
- Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. 2016. Learning Convolutional Neural Networks for Graphs. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 (New York, NY, USA) (ICML'16). JMLR.org, 2014--2023.Google Scholar
- Kenta Oono and Taiji Suzuki. 2019. Graph neural networks exponentially lose expressive power for node classification. 8th International Conference on Learning Representations (2019).Google Scholar
- Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al . 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019).Google Scholar
- John C Platt and Alan H Barr. 1988. Constrained differential optimization for neural networks. (1988).Google Scholar
- Yu Rong, Wenbing Huang, Tingyang Xu, and Junzhou Huang. 2019. Dropedge: Towards deep graph convolutional networks on node classification. arXiv preprint arXiv:1907.10903 (2019).Google Scholar
- Andrea Rossi, Donatella Firmani, Paolo Merialdo, and Tommaso Teofili. 2022. Explaining link prediction systems based on knowledge graph embeddings. In Proceedings of the 2022 ACM SIGMOD International Conference on Management of Data. 2062--2075.Google ScholarDigital Library
- Siddhartha Sahu, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin, and M Tamer Özsu. 2017. The ubiquity of large graphs and surprising challenges of graph processing. Proceedings of the VLDB Endowment 11, 4 (2017), 420--431.Google ScholarCross Ref
- Siddhartha Sahu, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin, and M. Tamer Özsu. 2020. The ubiquity of large graphs and surprising challenges of graph processing: extended survey. VLDB J. 29, 2--3 (2020), 595--618. https://doi.org/10.1007/s00778-019-00548-xGoogle ScholarCross Ref
- Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2009. The Graph Neural Network Model. IEEE Transactions on Neural Networks 20, 1 (2009), 61--80. https://doi.org/10.1109/TNN.2008.2005605Google ScholarDigital Library
- Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi-Rad. 2008. Collective classification in network data. AI magazine 29, 3 (2008), 93--93.Google Scholar
- Bin Shao, Haixun Wang, and Yatao Li. 2013. Trinity: A distributed graph engine on a memory cloud. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 505--516.Google ScholarDigital Library
- Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, and Stephan Günnemann. 2018. Pitfalls of graph neural network evaluation. arXiv preprint arXiv:1811.05868 (2018).Google Scholar
- Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2818--2826.Google ScholarCross Ref
- Shyam A. Tailor, Javier Fernandez-Marques, and Nicholas D. Lane. 2021. Degree-Quant: Quantization-Aware Training for Graph Neural Networks. In International Conference on Learning Representations. https://openreview.net/forum?id=NSBrFgJAHgGoogle Scholar
- Ertem Tuncel, Hakan Ferhatosmanoglu, and Kenneth Rose. 2002. VQ-index: An index structure for similarity searching in multimedia databases. In Proceedings of the tenth ACM international conference on Multimedia. 543--552.Google ScholarDigital Library
- Alina Vretinaris, Chuan Lei, Vasilis Efthymiou, Xiao Qin, and Fatma Özcan. 2021. Medical entity disambiguation using graph neural networks. In Proceedings of the 2021 ACM SIGMOD International Conference on Management of Data. 2310--2318.Google ScholarDigital Library
- Hanchen Wang, Defu Lian, Ying Zhang, Lu Qin, Xiangjian He, Yiguang Lin, and Xuemin Lin. 2021. Binarized graph neural network. World Wide Web 24, 3 (2021), 825--848. https://doi.org/10.1007/s11280-021-00878--3Google ScholarCross Ref
- Peiqi Wang, Xinfeng Xie, Lei Deng, Guoqi Li, Dongsheng Wang, and Yuan Xie. 2018. HitNet: Hybrid ternary recurrent neural network. In Proceedings of the 32nd International Conference on Neural Information Processing Systems. 602--612.Google Scholar
- Runhui Wang and Dong Deng. 2020. DeltaPQ: lossless product quantization code compression for high dimensional similarity search. Proceedings of the VLDB Endowment 13, 13 (2020), 3603--3616.Google ScholarDigital Library
- Shuang Wang and Hakan Ferhatosmanoglu. 2020. PPQ-trajectory: spatio-temporal quantization for querying in large trajectory repositories. Proceedings of the VLDB Endowment 14, 2 (2020), 215--227.Google ScholarDigital Library
- Yuke Wang, Boyuan Feng, and Yufei Ding. 2022. Qgtc: accelerating quantized graph neural networks via gpu tensor core. In Proceedings of the 27th ACM SIGPLAN symposium on principles and practice of parallel programming. 107--119.Google ScholarDigital Library
- Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. 2021. A Comprehensive Survey on Graph Neural Networks. IEEE Transactions on Neural Networks and Learning Systems 32, 1 (2021), 4--24. https://doi.org/10.1109/TNNLS.2020.2978386Google ScholarCross Ref
- Zhewei Yao, Zhen Dong, Zhangcheng Zheng, Amir Gholami, Jiali Yu, Eric Tan, Leyuan Wang, Qijing Huang, Yida Wang, Michael Mahoney, et al . 2021. Hawq-v3: Dyadic neural network quantization. In International Conference on Machine Learning. PMLR, 11875--11886.Google Scholar
- Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, and Shaoping Ma. 2021. Jointly optimizing query encoder and product quantization to improve retrieval performance. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (CIKM). 2487--2496.Google ScholarDigital Library
- Dalong Zhang, Xin Huang, Ziqi Liu, Jun Zhou, Zhiyang Hu, Xianzheng Song, Zhibang Ge, Lin Wang, Zhiqiang Zhang, and Yuan Qi. 2020. AGL: A Scalable System for Industrial-purpose Graph Machine Learning. Proceedings of the VLDB Endowment 13, 12 (2020), 3125--3137.Google ScholarDigital Library
- Lingxiao Zhao and Leman Akoglu. 2020. PairNorm: Tackling Oversmoothing in GNNs. In International Conference on Learning Representations. https://openreview.net/forum?id=rkecl1rtwBGoogle Scholar
- Yiren Zhao, Duo Wang, Daniel Bates, Robert Mullins, Mateja Jamnik, and Pietro Lio. 2020. Learned Low Precision Graph Neural Networks. arXiv preprint arXiv:2009.09232 (2020).Google Scholar
- Chenguang Zheng, Hongzhi Chen, Yuxuan Cheng, Zhezheng Song, Yifan Wu, Changji Li, James Cheng, Hao Yang, and Shuai Zhang. 2022. ByteGNN: efficient graph neural network training at large scale. Proceedings of the VLDB Endowment 15, 6 (2022), 1228--1242.Google ScholarDigital Library
- Meiqi Zhu, Xiao Wang, Chuan Shi, Houye Ji, and Peng Cui. 2021. Interpreting and unifying graph neural networks with an optimization framework. In Proceedings of the Web Conference 2021. 1215--1226.Google ScholarDigital Library
- Zeyu Zhu, Fanrong Li, Zitao Mo, Qinghao Hu, Gang Li, Zejian Liu, Xiaoyao Liang, and Jian Cheng. 2022. A2Q:Aggregation-Aware Quantization for Graph Neural Networks. In The Eleventh International Conference on Learning Representations.Google Scholar
Index Terms
- Low-bit Quantization for Deep Graph Neural Networks with Smoothness-aware Message Propagation
Recommendations
LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
Computer Vision – ECCV 2018AbstractAlthough weight and activation quantization is an effective approach for Deep Neural Network (DNN) compression and has a lot of potentials to increase inference speed leveraging bit-operations, there is still a noticeable gap in terms of ...
Robustness-aware 2-bit quantization with real-time performance for neural network
AbstractQuantized neural networks (NN) with reduced bit precision are practical solutions to minimize computational and memory resource requirements and play a vital role in machine learning. However, it is still challenging to avoid ...
Efficient bit-rate scalability for weighted squared error optimization in audio coding
We propose two quantization techniques for improving the bit-rate scalability of compression systems that optimize a weighted squared error (WSE) distortion metric. We show that quantization of the base-layer reconstruction error using entropy-coded ...
Comments