Multi-modal ego-centric data from inertial measurement units (IMU)
and first-person videos (FPV) can be effectively fused to recognise
proprioceptive activities. Existing IMU-based approaches mostly employ cascades
of handcrafted triaxial motion features or deep
frameworks trained on limited data. FPV approaches generally encode scene
dynamics with motion and pooled appearance features. In this paper, we propose
a multi-modal ego-centric proprioceptive activity recognition that uses a
convolutional neural network (CNN) followed by a long short-term memory (LSTM) network,
transfer learning and a merit-based fusion of IMU and/or FPV streams. The CNN
encodes short-term temporal dynamics of the ego-motion and the LSTM exploits
the long-term temporal dependency among activities. The merit of a stream is
evaluated with a sparsity measure of its initial classification output. We
validate the proposed framework on multiple visual and inertial datasets.
G. Abebe, A. Cavallaro, “Inertial-Vision: cross-domain knowledge
transfer for wearable sensors”, Proc. of
ICCV workshop on Assistive Computer Vision and Robotics (ACVR), Venice,
October 28, 2017 pdf
Source code for the method presented in the paper [Code]
Extracted features used in the paper [Data]