Document Type



Doctor of Philosophy


Industrial Engineering

First Adviser

Scheinberg, Katya

Other advisers/committee members

Curtis, Frank E.; Motee, Nader; Nocedal, Jorge


Modern machine learning practices at the interface of big data, distributed environment and complex learning objectives post great challenges to designing scalable optimization algorithms with theoretical guarantees. This thesis, built on the recent advances in randomized algorithms, concerns development of such methods in practice and the analysis of their theoretical implications in the context of large-scale structured learning problems, such as regularized regression/classification, matrix completion, hierarchical multi-label learning, etc. The first contribution of this work is thus a hybrid hierarchical learning system that achieve efficiency in a data-intensive environment. The intelligent decoding scheme inside the system further enhances the learning capacity by enabling a rich taxonomy representation to be induced in the label space. Important factors affecting the system scalability are studied and further generalized. This leads to the next contribution of the work -- a globally convergent inexact proximal quasi-Newton framework and the novel global convergence rate analysis. This work constitutes the first global convergence rate result for an algorithm that uses randomized coordinate descent to inexactly optimize subproblems at each iteration. The analysis quantifies precisely the complexity structure of proximal Newton-type algorithms, which makes it possible to optimize based on that structure to reduce complexity. The final contribution of the work is a practical algorithm which enjoys global convergence guarantee from the framework. The algorithm is memory- and communication-efficient and directly addresses the big data learning cases when both N (samples) and n (features) are large. We demonstrated that this general algorithm is very effective in practice and is competitive with state-of-the-art specialized methods.