Document Type



Doctor of Philosophy


Computer Science

First Adviser

Chuah, Mooi Choo

Other advisers/committee members

Davison, Brian D.; Spear, Michael; Huang, Wei-Min


With billions of images captured by mobile users everyday, automatically recognizing contents in such images has become a particularly important feature for various mobile apps, including augmented reality, product search, visual-based authentication etc. Traditionally, a client-server architecture is adopted such that the mobile client sends captured images/video frames to a cloud server, which runs a set of task-specific computer vision algorithms and sends back the recognition results. However, such scheme may cause problems related to user privacy, network stability/availability and device energy.In this dissertation, we investigate the problem of building a robust mobile visual recognition system that achieves high accuracy, low latency, low energy cost and privacy protection. Generally, we study two broad types of recognition methods: the bag of visual words (BOVW) based retrieval methods, which search the nearest neighbor image to a query image, and the state-of-the-art deep learning based methods, which recognize a given image using a trained deep neural network. The challenges of deploying BOVW based retrieval methods include: size of indexed image database, query latency, feature extraction efficiency and re-ranking performance. To address such challenges, we first proposed EMOD which enables efficient on-device image retrieval on a downloaded context-dependent partial image database. The efficiency is achieved by analyzing the BOVW processing pipeline and optimizing each module with algorithmic improvement.Recent deep learning based recognition approaches have been shown to greatly exceed the performance of traditional approaches. We identify several challenges of applying deep learning based recognition methods on mobile scenarios, namely energy efficiency and privacy protection for real-time visual processing, and mobile visual domain biases. Thus, we proposed two techniques to address them, (i) efficiently splitting the workload across heterogeneous computing resources, i.e., mobile devices and the cloud using our Moca framework, and (ii) using mobile visual domain adaptation as proposed in our collaborative edge-mediated platform DeepCham. Our extensive experiments on large-scale benchmark datasets and off-the-shelf mobile devices show our solutions provide better results than the state-of-the-art solutions.