This is the public course website for Topics in Applied Mathematics: Infinitely Large Neural Networks (응용수학특강: 심층신경망의 수학적 극한), 3341.751, Spring 2022. We will use eTL as a secondary website.
This class will have written homework assignments. (No programming.) Submit completed assignments through eTL.
Although the empirical success of deep learning is apparent, a theoretical explanation of this phenomenon remains elusive. In this course, we study the recent deep learning theory that aims to address this question.
We start with the representation power of wide neural networks. The classical theory shows that wide 2-layer neural networks are universal approximators, i.e., they can approximate arbitrary functions. However, this line of work does not address whether one can find such approximations using SGD or any practical algorithm.
Next, we study kernel methods and reproducing kernel Hilbert spaces (RKHS). While the classical "kernel trick" is not directly used in modern deep learning, the RKHS theory will serve as the functional analytical language for our later discussion of the dynamics of functions.
Next, using RKHS theory, we study random feature learning, an implementable non-deep learning method with provable $\mathcal{O}(1/\sqrt{N})$ guarantees. Random feature learning serves two important purposes: (i) A simple baseline for deep learning algorithms to surpass theoretically (often not accomplished) (ii) A stepping stone to the NTK theory.
Next, we study gradient flow, a continuous-time model of (stochastic) gradient descent. Deep learning theory often relies on two simplifications (i) that the network is infinitely large (ii) the neural network is trained with gradient flow rather than SGD. We study when and in what sense this approximation is accurate and what benefit this simplification brings.
Next, we study Gaussian processes and how randomly initialized deep neural networks behave as Gaussian processes at initialization (before training).
Next, we study the neural tangent kernel (NTK) theory. Using the theory of kernels and Gaussian processes, we establish a CLT-like limit of infinitely wide neural networks of fixed depth and establish trainability guarantees.
Next, we study the mean-field (MF) theory of deep neural networks. Using the theory of partial differential equations, we establish an LLN-like limit of infinitely wide neural networks of depth 2 (and $\ge 3$) and establish trainability guarantees.
Next, we shift the attention from infinitely wide neural networks to infinitely deep ones. We show that deep neural networks are also universal approximators.
Next, we study Neural ODE and equilibrium models, which be viewed as neural networks with continuous (infinite) depth. We define the infinite-depth limits and obtain the continuous-depth counterpart to backprop, which allows one to practically train these models using well-established ODE solvers. However, these models by themselves do not have trainability guarantees.
Finally, we study the trainability of infinitely deep neural networks. This requires combining mean-field theory with continuous-depth models.
Ernest K. Ryu, 27-205,
Tuesdays and Thursdays 3:30–4:45pm in hybrid format, at 500-L306 and over Zoom. Lectures will be delivered on the blackboard. Live (in-person or online) attendance is required. Zoom link and the password are available on eTL. We will try to provide typed lecture notes.
This class will have in-person midterm and final exams.
Homework 30%, midterm exam 30%, final exam 40%.
As this is a graduate-level course in mathematics, we will use the following advanced mathematical tools without reservation.
This course will have no programming component.