Date

8-1-2019

Document Type

Dissertation

Degree

Doctor of Philosophy

Department

Industrial Engineering

First Adviser

Lawrence Snyder

Abstract

Designing hand-engineered solutions for decision-making in complex environments is a challenging task. This dissertation investigates the possibility of having autonomous decision-makers in several real-world problems, e.g., in dynamic matching, marketing, and transportation. Achieving high-quality performance in these systems is strongly tied to the actions that a controller performs in different situations. This problem is further complicated by the fact that every single action might have long-term consequences, so ignoring them might cause unpredicted outcomes. My primary focus is to approach these problems with long-term objectives in mind, instead of only focusing on myopic ones. By borrowing techniques from optimal control and reinforcement learning, I design modeling infrastructures for each specific problem. Currently, the mainstream of reinforcement learning research uses games and robotics simulators for verification of the performance of an algorithm. In contrast, my main endeavor in this dissertation is to bridge the gap between the developed methods and their real-world applications, which are studied less often. For instance, for dynamic matching, I propose a simple matching rule with optimality guarantees; for customer journey, I use reinforcement learning to design an online algorithm based on temporal difference learning; and, for transportation, I showed that it is possible to train a solver with the capability of solving a wide variety of vehicle routing problems using reinforcement learning. Finally, I conclude this dissertation by introducing a new paradigm, which I call "corrective reinforcement learning." This paradigm addressed one major challenge in applying policies found by RL, that is, they might significantly differ from real systems. I propose a mechanism that resolves this issue by finding improved controllers which are close to the status quo. I believe that the models proposed in this dissertation will contribute to the discovery of methods that can outperform current systems, which are primarily controlled by humans.

Share

COinS