Reading: Givens GH, Hoeting JA (2013). Computational Statistics, 2nd edition. John Wiley & Sons, Inc., Hoboken, New Jersey. Chapter 2 until 2.2.3 (and Chaper 1.1-1.4 if needed).
Goodfellow I, Bengio Y, Courville A (2016). Deep Learning. MIT Press, Chapter 4.3 (and parts of Chapter 2 and Chapter 4.2 if needed).
Sun S, Cao Z, Zhu H, Zhao J (2019). A survey of optimization methods from a machine learning perspective, Section I, II, IIIA1, IIIB1-2.
AlphaOpt (2017). Introduction To Optimization: Gradient Based Algorithms, Youtube video (very elementary introduction of concepts).
About analytical optimisation. (Frank Miller, September 2020)
Understanding gradient descent. (Eli Bendersky, August 2016)
Bisection method. (Frank Miller, March 2020; 4min video)
Example code: steepestascent.r
Description: CS4787 explores the principles behind scalable machine learning systems. The course will cover the algorithmic and the implementation principles that power the current generation of machine learning on big data. We will cover training and inference for both traditional ML algorithms such as linear and logistic regression, as well as deep models. Topics will include: estimating statistics of data quickly with subsampling, stochastic gradient descent and other scalable optimization methods, mini-batch training, accelerated methods, adaptive learning rates, methods for scalable deep learning, hyperparameter optimization, parallel and distributed training, and quantization and model compression.
Polyak Introduction To Optimization Pdf 22
Material: The course is based on books, papers, and other texts in machine learning, scalable optimization, and systems. Texts will be provided ahead of time on the website on a per-lecture basis. You aren't expected to necessarily read the texts, but they will provide useful background for the material we are discussing. 2ff7e9595c
Comments