Modified Overcomplete Autoencoder for Anomaly Detection

Modified Overcomplete Autoencoder for Anomaly Detection

Modified Overcomplete Autoencoder (MOA) enhances anomaly detection using TinyML for embedded systems. Developed by Yan Siang Yap and Mohd Ridzuan Ahmad, this study focuses on detecting anomalies in USB fan operations, particularly when blades are damaged. The MOA architecture utilizes both accelerometer and gyroscope data to achieve high accuracy and low false positive rates. With a model size of only 17 kB, it is suitable for deployment on resource-constrained microcontrollers. This research is valuable for engineers and developers working on real-time anomaly detection in IoT applications.

Key Points

  • Proposes a new MOA architecture for improved anomaly detection in embedded systems.
  • Achieves 99.23% accuracy and 99.70% recall in detecting USB fan anomalies.
  • Utilizes accelerometer and gyroscope data for comprehensive vibration analysis.
  • Model size of 17 kB allows deployment on various microcontrollers.
  • Employs unsupervised learning to effectively identify anomalies without labeled data.
196
/ 4
VOL. 8, NO. 10, OCTOBER 2024 2504104
Mechanical sensors
Modified Overcomplete Autoencoder for Anomaly Detection Based on TinyML
Yan Siang Yap and Mohd Ridzuan Ahmad
Department of Control and Mechatronics Engineering, Faculty of Electrical Engineering, Universiti Teknologi Malaysia, Johor Bahru,
Johor 81310, Malaysia
Manuscript received 14 July 2024; revised 11 August 2024 and 8 September 2024; accepted 12 September 2024. Date of publication 18 September 2024;
date of current version 7 October 2024.
Abstract—This letter explores the architecture of tiny machine learning (TinyML).
Deploying machine learning into embedded devices is challenging due to the
limited computation power and memory space. An experimental setup has been
designed for the anomaly detection of a USB fan. We collect the normal data
from a USB fan, and abnormal data are simulated using a broken fan blade.
Two different speeds, namely, speed 1 and speed 2, have been used to collect
the normal data and abnormal data. The normal data collected are used to train
the standard autoencoder model and our proposed model modified overcom-
plete asymmetric autoencoder (MOA), respectively. The trained model is then
deployed into a microcontroller, i.e., Arduino Nano 33 BLE Sense. The proposed MOA can achieve 99.23% accuracy,
recall of 99.70%, precision of 98.77%, F1 score of 99.23%, and false positive rate of 1.222%. Besides that, our
MOA model only occupies 17 kB. Therefore, it can be fitted into most microcontrollers for embedded applications.
Index Terms—Mechanical sensors, anomaly detection, autoencoder (AE), embedded system, tiny machine learning (TinyML).
I. INTRODUCTION
Autoencoder (AE) is a type of neural network architecture that
can learn features from unlabeled data automatically and is mainly
used for unsupervised learning tasks [1], [2]. Therefore, AEs do not
require any labeled data during the training process where labeled
data are not available or expensive to obtain [1], [3]. AEs are made
up of two parts: encoder and decoder. The encoder encodes the input
data from high dimensions to a low-dimensional latent space, and
the decoder decodes the data from this low-dimensional latent space
back to high dimensions. AEs reconstruct the input data based on
the output from the decoder aiming to reconstruct the data as close
as the original input data. There are several types of AE, such as
undercomplete [4], overcomplete [4], variational [5],orsparseAE
[6], [7], [8]. Tiny machine learning (TinyML) joins embedded Internet
of Things (IoT) devices and machine learning. TinyML aims to bring
machine learning into ultra-low-power devices [9]. Introducing AE for
anomaly detection in TinyML enables real-time processing, automatic
feature extraction, and an unsupervised learning approach. The major
challenge of deploying TinyML to microcontrollers is the memory and
energy limitations as well as the onboard computation power. A trade-
off between the model size and accuracy shall be considered. Reducing
the model size will impact the accuracy. Therefore, it is important to
balance this tradeoff without severely compromising the accuracy.
AEs rely heavily on the quality of data due to their nature of
reconstructing output based on the input. The data used to train the
AEs s hould be of good quality. Vibrations involve circular motion,
while accelerometers detect linear motion. By integrating the onboard
gyroscope and onboard accelerometer of the Arduino Nano 33 BLE
Sense to capture circular motion, we can achieve a more comprehen-
sive reading of vibration motion. In this letter, vibration data, including
Corresponding author: Mohd Ridzuan Ahmad (e-mail: mdridzuan@utm.my).
Associate Editor: G. Langfelder.
Digital Object Identifier 10.1109/LSENS.2024.3463977
both the accelerometer readings and gyroscope readings, have been
collected to train our proposed model. There is a bottleneck layer
in AEs that extracts the most salient features of the raw input data.
Increasing the number of inputs can let the bottleneck layer extract
more salient features.
Abbasi et al. [10] explored a machine-driven design exploration
approach that leverages both human experience and knowledge of
machines to produce a highly compact deep convolutional AE architec-
ture, OutlierNets. Purohit et al. used the MIMII datasets [11], focusing
on the slider and fan machines for model evaluation. The OutlierNets
achieved an 83% area under the curve (AUC) for the fan machine and
88.8% AUC for the slider machine. Lord et al. [12] used both AE and
variational autoencoder (VAE) to detect point anomalies of a washing
machine in an unbalanced dry cycle. The author’s embedded platform
is Arduino Nano 33 BLE, mounted on the washing machine. Data
collection is via the onboard accelerometer of the Arduino Nano 33
BLE. The author used an unsupervised learning method to train AE and
VAE, and the neural network architecture has only one hidden layer.
When comparing the AE and VAE, AE achieved 92% accuracy, 90%
precision, and 99% recall. However, VAE only achieved 66% accuracy,
74% precision, and 80% recall. Mostafavi and Sadighi [13] developed
a novel high-performance and precise anomaly detection framework
based on edge computing technology for real-time health monitoring
of industrial assets. The author achieved over 99.9% precision, recall,
accuracy, and F1 score. In 2021, Andrade et al. [14] introduced an
unsupervised TinyML method for pavement anomaly detection via
the microcontroller Arduino Nano 33 IoT. The author utilized the
typicality and eccentricity data analytics algorithm [15] to detect an
anomaly. The author achieved a recall classifier of 69%, and the F1
score is 82%.
Our contribution to this article includes the following.
1) We propose a new model to improve the existing normal AE
architecture.
2) We deploy the model into the microcontroller and collecting
inferencing results.
2475-1472 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: MCKV Institute of Engineering. Downloaded on April 15,2026 at 05:55:48 UTC from IEEE Xplore. Restrictions apply.
2504104 VOL. 8, NO. 10, OCTOBER 2024
Fig. 1. Architecture of the standard AE model.
Fig. 2. Architecture of the MOA model.
3) Our proposed modified overcomplete asymmetric autoencoder
(MOA) is only 17 kB, which can be deployed in many resource-
constrained microcontrollers.
II. PROPOSED SOLUTION
In this letter, we propose a new architecture to achieve high accuracy
in anomaly detection. There are several machine learning approaches
in anomaly detection: supervised, unsupervised, and semisupervised.
In this letter, unsupervised learning was used since anomaly rarely
occurs, and the collection of anomaly datasets is complex. In nor-
mal conditions, the USB fan operates with intact blades, resulting
in consistent airflow and stable performance. In contrast, abnormal
conditions occur when the USB fan runs with a broken blade, leading to
uneven airflow and potential vibrations or instability. This setup helps
to differentiate between the fan’s normal and abnormal operational
states based on the condition of the blades.
The proposed MOA architecture is an improved version of an
AE. An overcomplete AE indicates that the number of nodes in the
bottleneck layer is larger than the number of nodes in the input layer.
This will make the model simply copy the features to the decoder.
Thus, at the decoder layer, the first decoder layer consists of three
nodes. The proposed MOA has six input, two hidden, and six output
layers. The accelerometer can only detect linear motion. However, the
gyroscope can detect rotational rate. Therefore, combining a gyroscope
and accelerometer can fully capture the vibration signal. The raw
normal data consist of the accelerometer (ax, ay, az) and gyroscope
(gx, gy, gz). These raw normal data will be used as the input, and the
encoder layer will compress the important features and pass them to the
bottleneck layer. Then, the decoder layer will reconstruct the original
input (ax, ay, az, gx, gy, gz) from the bottleneck layer.
Fig. 1 shows the architecture of the standard AE model while Fig. 2
shows the proposed MOA model. The standard AE only consisted of
encoder, bottleneck, and decoder layers. However, in the bottleneck
layer of the MOA model, since it is an overcomplete AE, there are
seven nodes, exceeding the six nodes in the input layer. In the decoder
part, there will be two decoders, which are decoder layer one and
decoder layer two also known as the output layer.
In our MOA architecture, we employed the Rectified Linear Unit
(ReLU) activation function in both the bottleneck layer and the initial
Fig. 3. Training curve of MOA.
decoder layer for the sake of simplicity and fast computation compared
to other activation functions [12]. Sigmoid activation was selected for
the output layer to ensure that the output is between 0 and 1. The
Adam optimizer was selected for its ability to adaptively adjust the
learning rates for various parameters. The mean absolute error (MAE)
is preferred because it assigns equal weight to all errors regardless of
their magnitude, thereby enhancing the robustness against outliers and
noise compared to the mean-squared error.
Fig. 3 shows the training curve of MOA during training in Google
Collab. The blue line indicates the training loss, and the orange curve
indicates the validation loss. Based on the training curve, both the
validation loss and training loss began to decrease but remained un-
changed from epoch 7 onward, indicating that the model had reached
a plateau in learning. Several steps were introduced to improve the
model’s generalization, including early stopping and L2 regularization.
Early stopping has been implemented to speed up the convergence
speed and to prevent underfitting or overfitting [16]. Overfitting is
caused when t here is a decrease in training loss but an increase in
validation loss. When overfitting occurs, the model will learn the
training data too well, causing the model not to generalize well [17].
L2 regularization has been implemented to prevent the model from
overfitting. L2 regularization is the introduction of penalty terms to
the model to prevent it from overfitting. Underfitting generally occurs
when both training loss and validation loss are high or increasing. It
can also be a concern if the training loss decreases but early stopping
is applied, potentially preventing the model from capturing important
data features [18]. Based on the training curve of MOA, the training
loss is very low (less than 0.06), which suggests that the MOA has
learned the data well. Therefore, implementing early stopping will not
lead to underfitting.
The training halted at epoch 7 upon reaching optimal weight.
Besides that, the higher validation loss compared to training loss is
attributed to the training being performed exclusively on normal data,
while validation includes a combination of normal and abnormal data,
thereby causing the validation loss higher than the training loss.
III. METHODOLOGY
A USB fan is used for anomaly detection, abnormal condition is
based on the condition of the fan blade. T he experimental setup is
shown in Fig. 4. The USB fan is a fixed fan that cannot rotate and is
supported by two L-brackets. There is another L-bracket used to fix the
position of the micro-USB cable that is used for data transmission from
the microcontroller to the PC. Fig. 5 shows two different conditions
Authorized licensed use limited to: MCKV Institute of Engineering. Downloaded on April 15,2026 at 05:55:48 UTC from IEEE Xplore. Restrictions apply.
VOL. 8, NO. 10, OCTOBER 2024 2504104
Fig. 4. Front view of the USB fan (left) and back view of the USB fan
(right).
Fig. 5. Normal condition of the fan blade (left) and abnormal condition
of the fan blade (right).
of fan blades. On the left-hand side is the normal condition of the fan
blade while on the right-hand side is the abnormal condition of the
fan blade. A total of 30 000 normal datasets (15 000 for speed 1 and
15 000 for speed 2) have been collected. Only 3000 sample datasets
(1500 for speed 1 and 1500 for speed 2) have been collected for the
abnormal conditions. The raw input data were collected via the onboard
accelerometer and gyroscope of the microcontroller, Arduino Nano 33
BLE Sense. The sampling time of the accelerometer and gyroscope is
16 ms, whereas the inference time is set at 100 ms.
The proposed model MOA was developed using Python language,
TensorFlow, and Tensorflow Lite. The model was trained with unsuper-
vised machine learning; only normal datasets were used in the training,
and the validation consisted of a mixture of normal and abnormal
datasets. Train and test split is based on 80% and 20%.
In total, 80% of normal data will be used for training, and the
remaining 20% will consist of a mixture of normal and abnormal
datasets for testing. The collection of normal and abnormal datasets
was conducted in a controlled environment free from interference,
to ensure a clean dataset. The threshold value is calculated based on
the mean and standard deviation of the training loss. After training
the model, the model is uploaded into the microcontroller to make
inferences.
First, the microcontroller will read the accelerometer and gyroscope
sensor data and preprocess using min–max normalization. Then, the
model will calculate the MAE between the reconstruction output and
the input value
MAE =
n
i = 1
|
y
i
x
i
|
n
(1)
Equation (1) shows the formula to calculate the MAE, where y
i
is
the predicted value, x
i
is the true value, and n is the total number of data
points. When the MAE value exceeds the threshold value, it indicates
an anomaly. When an anomaly is detected, a red LED will light up
substituting the normal-condition green LED.
During inferencing, the model is first uploaded to the microcon-
troller, Arduino Nano 33 BLE Sense, then tested with normal condi-
tions for both speeds and then tested for abnormal conditions for both
speeds. These steps are then repeated with another model to obtain the
comparative inferencing results.
IV. RESULTS
Several matrix evaluations have been used to evaluate the proposed
model, such as accuracy, recall, precision, F1 score, and false positive
rate (FPR). Equations (2)(6) show the formula for calculating the
accuracy, recall, precision, F1 score, and FPR, where TP is the true
positive, TN is the true negative, FP is the false positive, and FN
is the false negative. The inference results, obtained from deploying
the model on the Arduino Nano 33 BLE microcontroller, consist of
6000 data points evenly distributed across different conditions: 1500
data points each for normal and abnormal scenarios at both speed 1
and speed 2. Accuracy is calculated based on the number of correct
predictions over the size of datasets. Besides, recall is also known as
sensitivity, which measures the number of correctly predicted positive
overall positive cases in the dataset. Precision shows the number of
positive predictions made correctly. F1 score performs better in an
imbalanced dataset [19]. FPR indicates the rate of normal instances
that were detected as anomalies. The standard AE achieved a TP of
2960, FP of 40, TN of 2977, and FN of 23. While the MOA achieved
a TP of 2963, FP of 37, TN of 2991, and FN of 9
Accuracy =
TP + TN
TP + FP + TN + FN
× 100% (2)
Recall =
TP
TP + FN
× 100% (3)
Precision =
TP
TP + FP
× 100% (4)
F1 score = 2 ×
Precision × Recall
Precision + Recall
× 100% (5)
FPR =
FP
FP + TN
× 100%. (6)
For the standard AE, the threshold was selected as 0.107515 and
achieved an accuracy of 98.95%, recall of 99.23%, precision of
98.67%, F1 score of 98.95%, and FPR of 1.326%. The threshold of our
proposed model MOA is 0.107627 and has achieved an accuracy of
99.23%, recall of 99.70%, precision of 98.77%, F1 score of 99.23%,
and FPR of 1.222%.
V. DISCUSSION
Our approach employs unsupervised machine learning with training
conducted on an imbalanced dataset. Our proposed MOA model
outperforms the existing AE architecture as indicated in [12],where
the AE model achieved 92% accuracy, 90% precision, and 99% recall
using the AE model. By introducing additional input, such as the
gyroscope data, our proposed model MOA achieved an accuracy of
99.23%, recall of 99.70%, precision of 98.77%, F1 score of 99.23%,
and FPR of 1.222%. We also achieved an accuracy of 98.95%, recall
of 99.23%, precision of 98.67%, F1 score of 98.95%, and FPR of
1.326% with the standard AE. While the improvements in our MOA
model compared to the standard AE are modest, our analysis concludes
that MOA exhibits superior performance across all metrics, including
higher accuracy, higher recall, higher precision, higher F1 score, and
lower FPR. A lower FPR indicates improved accuracy and reduced
errors.
Our article demonstrated that despite the difference in sampling
time (16 ms sampling time of accelerometer and gyroscope) and the
inferencing time (100 ms), the model can still perform effectively. The
MOA AE size is 17 kB, larger than the standard AE model of 13 kB,
which contributes to improved metric scores.
Authorized licensed use limited to: MCKV Institute of Engineering. Downloaded on April 15,2026 at 05:55:48 UTC from IEEE Xplore. Restrictions apply.
/ 4
End of Document
196
You May Also Like

FAQs of Modified Overcomplete Autoencoder for Anomaly Detection

What is the purpose of the Modified Overcomplete Autoencoder?
The Modified Overcomplete Autoencoder (MOA) is designed to enhance anomaly detection in embedded systems, specifically targeting applications in TinyML. It focuses on identifying operational anomalies in devices like USB fans, where issues such as broken blades can lead to inconsistent performance. By leveraging both accelerometer and gyroscope data, the MOA effectively captures the necessary features to distinguish between normal and abnormal conditions.
How does the MOA model achieve high accuracy in anomaly detection?
The MOA model achieves high accuracy by employing an architecture that integrates both accelerometer and gyroscope data, allowing for a comprehensive analysis of vibration signals. The model is trained using unsupervised learning, which is particularly effective in scenarios where labeled anomaly data is scarce. With a reported accuracy of 99.23% and a recall of 99.70%, the MOA demonstrates its capability to accurately identify anomalies while maintaining a low false positive rate.
What are the key features of the MOA architecture?
The MOA architecture features an overcomplete design, where the bottleneck layer contains more nodes than the input layer, allowing for better feature extraction. It consists of six input nodes, two hidden layers, and six output nodes, utilizing Rectified Linear Unit (ReLU) activation functions for efficiency. The model's small size of 17 kB makes it suitable for deployment on resource-constrained microcontrollers, which is essential for real-time applications in the Internet of Things.
What data was used to train the MOA model?
The MOA model was trained using a dataset that included 30,000 normal data samples collected from a USB fan operating under two different speeds. Additionally, 3,000 abnormal data samples were generated by simulating conditions such as a broken fan blade. This combination of normal and abnormal data allows the model to learn effectively and distinguish between typical and atypical operational states.
What are the implications of this research for IoT applications?
This research has significant implications for IoT applications, particularly in the realm of real-time anomaly detection. The ability to deploy the MOA model on low-power microcontrollers enables efficient monitoring of various devices, enhancing operational reliability. As industries increasingly adopt IoT solutions, the MOA's high accuracy and low resource requirements position it as a valuable tool for ensuring the health and performance of embedded systems.

Related of Modified Overcomplete Autoencoder for Anomaly Detection