Master of Science
Chuah, Mooi C.
In this paper, we attack the problem of classifying human actions from a single, static image. We propose that leveraging an automatic caption generator for this task will provide extra information when compared to a traditional convolutional neural network based classifier. The architecture consists of two stages, caption generation and caption classification, used sequentially to a proposed human action class label from a single image. Evaluation is performed of our system and it is evident that caption generation is the limiting factor in accuracy. We propose fixes to both the dataset and the caption generator, in order to improve the model. Finally, it is discovered that caption classification is significantly improved by concatenating all captions from a single image together, to produce one input vector.
Kafka, Adam, "Caption Aided Action Recognition Using Single Images" (2017). Theses and Dissertations. 2653.