Affine invariant stochastic optimization
José Vidal Alcalá (CIMAT-Merida)
We will introduce the Affine Invariant Stochastic Gradient Descent (AISGD), an online stochastic optimization algorithm with a built-in approximation of the inverse Hessian matrix of the objective function. Most of the computational work on each step of this algorithm is plain matrix-vector multiplication, and therefore full parallelization is feasible.
We apply the AISGD algorithm in the training of deep neural networks for image classification. In state of the art networks, the number of parameters is close to eight million and the memory required to store the inverse Hessian approximation is above the computer capacity. We alleviate this problem using an online PCA approximation that reduces the memory burden as well as the computational work.