Efficient and Scalable Optimization Methods for Training Large-scale Machine Learning Models

Majid Jahani

About this Digital Document

Many important problems in machine learning (ML) and data science are formulated as optimization problems and solved using optimization algorithms. With the scale of modern datasets and the ill-conditioned nature of real-world problems, new powerful algorithms are needed to address the aforementioned issues. The goal of this thesis is to develop and design efficient and scalable optimization algorithms with corresponding convergence guarantees for ill-conditioned problems that often arise in large-scale ML. The thesis contains three main parts with 5 chapters. Chapter 1 deals with novel accelerated gradient methods with optimality certificates. Chapters 2 and 3 talks about new quasi-Newton algorithms. Moreover, chapters 4 and 5 deal with efficient and scalable distributed optimization algorithms.The focus of Chapter 1 is to address the question of how to construct an appropriate sequence of lower bounds for strongly convex smooth functions and for strongly convex composite functions. We propose several first order methods for minimizing strongly convex functions in both the smooth and composite cases. We will show that all presented algorithms converge linearly, with the accelerated variants enjoying the optimal linear rate of convergence.Chapter 2 is concerned with two sampled quasi-Newton methods: sampled LBFGS and sampled LSR1. Contrary to the classical variants of these methods that sequentially build (inverse) Hessian approximations as the optimization progresses, our proposed methods sample points randomly around the current iterate to produce these approximations. As a result, the approximations constructed make use of more reliable (recent and local) information, and do not depend on past information that could be significantly stale.Chapter 3 describes a new optimization algorithm that bridges the gap between first- and second-order methods by computing a search direction that uses a second-order-type update in one subspace, coupled with a scaled steepest descent step in the orthogonal complement subspace. To this end, partial curvature information is incorporated to help with ill-conditioning, while simultaneously allowing the algorithm to scale to the large problem dimensions often encountered in machine learning applications.In Chapter 4, a novel approach is proposed in which the sample size is gradually increasing to quickly obtain a solution whose empirical loss is under satisfactory statistical accuracy. Our proposed method is multistage in which the solution of a stage serves as a warm start for the next stage which contains more samples (including the samples in the previous stage). The proposed multistage algorithm reduces the number of passes over data to achieve the approximate statistical accuracy of the full training set. Moreover, our algorithm in nature is easy to be distributed and shares the strong scaling property indicating that acceleration is always expected by using more computing nodes.Chapter 5 presents a scalable distributed implementation of the Sampled Limited-memory Symmetric Rank-1 (S-LSR1) algorithm. First, we show that a naive distributed implementation of S-LSR1 requires multiple rounds of expensive communications at every iteration and thus is inefficient. We then propose DS-LSR1, a communication-efficient variant that: (i) drastically reduces the amount of data communicated at every iteration, (ii) has favorable work-load balancing across nodes, and (iii) is matrix-free and inverse-free.

Full Title

Efficient and Scalable Optimization Methods for Training Large-scale Machine Learning Models

Member of

Theses and Dissertations

Contributor(s)

Creator: Jahani, Majid

Thesis advisor: Takáč, Martin

Publisher

Lehigh University

Date Issued

2021-05

Type

Text

Genre

theses

Form

electronic documents

Department name

Industrial Engineering

Digital Format

electronic documents

Media type

application/pdf

Creator role

Graduate Student

Subject (LCSH)

Industrial Engineering

Keywords

deep learning

Distributed Optimization

High-performance Computing

machine learning

Nonlinear Optimization

Jahani, . M. (2021). Efficient and Scalable Optimization Methods for Training Large-scale Machine Learning Models (1–). https://preserve.lehigh.edu/lehigh-scholarship/graduate-publications-theses-dissertations/theses-dissertations/efficient

Jahani, Majid. 2021. “Efficient and Scalable Optimization Methods for Training Large-Scale Machine Learning Models”. https://preserve.lehigh.edu/lehigh-scholarship/graduate-publications-theses-dissertations/theses-dissertations/efficient.

Jahani, Majid. Efficient and Scalable Optimization Methods for Training Large-Scale Machine Learning Models. May 2021, https://preserve.lehigh.edu/lehigh-scholarship/graduate-publications-theses-dissertations/theses-dissertations/efficient.

Efficient and Scalable Optimization Methods for Training Large-scale Machine Learning Models

About this Digital Document

In Sub-Collections

Theses and Dissertations

Library and Technology Services

Connect with Us

Make a Gift

Efficient and Scalable Optimization Methods for Training Large-scale Machine Learning Models

About this Digital Document

In Sub-Collections

Theses and Dissertations

Related Items

Applications of Machine Learning in Supply Chains

Development of High-Sensitivity and High-resolution Optical Coherence Tomography

Echolocation using Deep Learning: An Experimental Approach

Effective and Secure Healthcare Machine Learning System with Explanations Based on High Quality Crowdsourcing Data

Enhancing the Security and Reliability of Deep Learning Systems under Attacks and Hardware Faults

Evaluation and Validation of COLA in Complex Deep Neural Networks

Connect with Us

Make a Gift