Abstract: In recent years, the deep learning approaches have been proved to be able to learn powerful representations of the inputs in various tasks, such as image classification, object recognition, and scene understanding. Our work demonstrates the generality and capacity of deep learning approaches through a series of case studies including sketch matching and human activity understanding. In these studies, we explored the combinations of the neural network models with existing machine learning techniques and extend the deep learning approach as needed. We first studied the representation learning approaches for static inputs. We adopted Siamese network to learn similarities of sketch images and develop a novel method for sketch based 3D shape retrieval. Then we turned to study the representation learning of sequential inputs such as human actions. Three related tasks were explored: 1) human action prediction; 2) finger force estimation in manipulation actions; and 3) bimodal learning for human action recognition. The ability to predict the ongoing human actions is highly important in some scenarios such as human-robot collaboration systems. We developed an action prediction method as well as finger force estimation method based on Long Short-Term Memory (LSTM) architecture. Then we further studied the bimodal learning approaches for human action recognition. Our model takes both video frames and sensor data to recognize the action. The trained model can be applied to pure video inputs during the testing stage. Experiment results show that the motor information is helpful for increasing action recognition accuracy.
Fang Wang is a PhD student under the supervision of Prof. Fatih Porikli, Dr. Yi Li, and Dr. Justin Domke at the Australian National University. Her research interests include computer vision, robotics, and deep learning.