A new CNN-LSTM architecture for activity recognition employing wearable motion sensor data: enabling diverse feature extraction

Limited Access
This item is unavailable until:
2025-06-28

Date

2023-06-28

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats
17
views
3
downloads

Citation Stats

Series

Abstract

Extracting representative features to recognize human activities through the use of wearables is an area of on-going research. While hand-crafted features and machine learning (ML) techniques have been sufficiently well investigated in the past, the use of deep learning (DL) techniques is the current trend. Specifically, Convolutional Neural Networks (CNNs), Long Short Term Memory Networks (LSTMs), and hybrid models have been investigated. We propose a novel hybrid network architecture to recognize human activities through the use of wearable motion sensors and DL techniques. The LSTM and the 2D CNN branches of the model that run in parallel receive the raw signals and their spectrograms, respectively. We concatenate the features extracted at each branch and use them for activity recognition. We compare the classification performance of the proposed network with three single and three hybrid commonly used network architectures: 1D CNN, 2D CNN, LSTM, standard 1D CNN-LSTM, 1D CNN-LSTM proposed by Ordóñez and Roggen, and an alternative 1D CNN-LSTM model. We tune the hyper-parameters of six of the models using Bayesian optimization and test the models on two publicly available datasets. The comparison between the seven networks is based on four performance metrics and complexity measures. Because of the stochastic nature of DL algorithms, we provide the average values and standard deviations of the performance metrics over ten repetitions of each experiment. The proposed 2D CNN-LSTM architecture achieves the highest average accuracies of 95.66% and 92.95% on the two datasets, which are, respectively, 2.45% and 3.18% above those of the 2D CNN model that ranks the second. This improvement is a consequence of the proposed model enabling the extraction of a broader range of complementary features that comprehensively represent human activities. We evaluate the complexities of the networks in terms of the total number of parameters, model size, training/testing time, and the number of floating point operations (FLOPs). We also compare the results of the proposed network with those of recent related work that use the same datasets.

Source Title

Engineering Applications of Artificial Intelligence

Publisher

Elsevier

Course

Other identifiers

Book Title

Degree Discipline

Degree Level

Degree Name

Citation

Published Version (Please cite this version)

Language

en