Title:
Efficient training of Kolmogorov-Arnold Networks (KANs) – methods, benchmarks, and applications
Abstract:
KANs are nonlinear regression models with a specific architecture that is based on a composition of functions. They branched off from the Kolmogorov’s proof that any continuous multivariate function can be exactly represented by a specific composition of continuous univariate functions [1]. The exact form of the representation is a universal approximator and has been extensively studied from 1950s, e.g. [2,3]. Approximate forms acquired various names – models or networks and have been studied from 1990s, when their power was first discovered [4]. KANs have been used for machine-learning (ML) applications from 2000s [5], but remained largely unnoticed until May 2024, when a paper preprint by a team from MIT was posted online [6].
The immense recent growth of popularity of KANs led to the significant number of preprints, most of which demonstrate their superior accuracy when compared to traditional neural networks – multilayer perceptrons (MLPs). However, employing traditional training methods, which are used for other ML models, leads to larger training times than for MLPs.
In this talk, a lightweight training method for KANs, first proposed in 2020 for piecewise-linear underlying functions [7] and generalised to arbitrary basis representation in 2023 [8], will be presented. The method is based on the Kaczmarz algorithm. Efficient implementations of KANs (in C#, C++, and MATLAB) will be shown that significantly outcompete MLPs both in terms of accuracy andtraining time – e.g. 4–10 minutes for KANs vs. 4–8 hours for MLPs on datasets with 25 inputs and 10 million records. Furthermore, aspects related to deep KANs, parallel implementation of the training, and uncertainty quantification for KANs will be discussed [9].
References:
[1] A. N. Kolmogorov, Dokl. Akad. Nauk SSSR, 114(5):953–956, 1957.
[2] G. G. Lorentz, Am. Math. Mon., 69(6):469–485, 1962.
[3] D. A. Sprecher, Trans. Am. Math. Soc., 115(3):340–355, 1965.
[4] V. Kurkova, Neural Netw., 5(3):501–506, 1992.
[5] B. Igelnik, N. Parikh, IEEE Trans. Neural Netw., 14(4):725–733, 2003.
[6] Z. Liu et al., arXiv:2404.19756, 2024.
[7] A. Polar, M. Poluektov, Eng. Appl. Artif. Intell., 99:104137, 2021.
[8] M. Poluektov, A. Polar, arXiv:2305.08194, 2023.
[9] A. Polar, M. Poluektov, arXiv:2104.01714, 2021.
Bio:
Mikhail Poluektov is currently appointed as a Lecturer (Assistant Professor) in Mathematics at the University of Dundee (UK). His research focuses on computational and applied mathematics covering a large range of models and methods. In particular, his recent research includes fictitious-domain and multiscale methods for non-linear partial differential equations, as well as approximation theory methods. His work has been published in journals such as Computer Methods in Applied Mechanics and Engineering. Prior to current appointment, Dr Poluektov held a Senior Research Fellow position at the University of Warwick (UK). Dr Poluektov obtained a PhD from the Eindhoven University of Technology (Netherlands).