Modified Overcomplete Autoencoder For Anomaly Detection

VOL. 8, NO. 10, OCTOBER 2024 2504104

Mechanical sensors

Modified Overcomplete Autoencoder for Anomaly Detection Based on TinyML

Yan Siang Yap and Mohd Ridzuan Ahmad

Department of Control and Mechatronics Engineering, Faculty of Electrical Engineering, Universiti Teknologi Malaysia, Johor Bahru,

Johor 81310, Malaysia

Manuscript received 14 July 2024; revised 11 August 2024 and 8 September 2024; accepted 12 September 2024. Date of publication 18 September 2024;

date of current version 7 October 2024.

Abstract—This letter explores the architecture of tiny machine learning (TinyML).

Deploying machine learning into embedded devices is challenging due to the

limited computation power and memory space. An experimental setup has been

designed for the anomaly detection of a USB fan. We collect the normal data

from a USB fan, and abnormal data are simulated using a broken fan blade.

Two different speeds, namely, speed 1 and speed 2, have been used to collect

the normal data and abnormal data. The normal data collected are used to train

the standard autoencoder model and our proposed model modified overcom-

plete asymmetric autoencoder (MOA), respectively. The trained model is then

deployed into a microcontroller, i.e., Arduino Nano 33 BLE Sense. The proposed MOA can achieve 99.23% accuracy,

recall of 99.70%, precision of 98.77%, F1 score of 99.23%, and false positive rate of 1.222%. Besides that, our

MOA model only occupies 17 kB. Therefore, it can be fitted into most microcontrollers for embedded applications.

Index Terms—Mechanical sensors, anomaly detection, autoencoder (AE), embedded system, tiny machine learning (TinyML).

I. INTRODUCTION

Autoencoder (AE) is a type of neural network architecture that

can learn features from unlabeled data automatically and is mainly

used for unsupervised learning tasks [1], [2]. Therefore, AEs do not

require any labeled data during the training process where labeled

data are not available or expensive to obtain [1], [3]. AEs are made

up of two parts: encoder and decoder. The encoder encodes the input

data from high dimensions to a low-dimensional latent space, and

the decoder decodes the data from this low-dimensional latent space

back to high dimensions. AEs reconstruct the input data based on

the output from the decoder aiming to reconstruct the data as close

as the original input data. There are several types of AE, such as

undercomplete [4], overcomplete [4], variational [5],orsparseAE

[6], [7], [8]. Tiny machine learning (TinyML) joins embedded Internet

of Things (IoT) devices and machine learning. TinyML aims to bring

machine learning into ultra-low-power devices [9]. Introducing AE for

anomaly detection in TinyML enables real-time processing, automatic

feature extraction, and an unsupervised learning approach. The major

challenge of deploying TinyML to microcontrollers is the memory and

energy limitations as well as the onboard computation power. A trade-

off between the model size and accuracy shall be considered. Reducing

the model size will impact the accuracy. Therefore, it is important to

balance this tradeoff without severely compromising the accuracy.

AEs rely heavily on the quality of data due to their nature of

reconstructing output based on the input. The data used to train the

AEs s hould be of good quality. Vibrations involve circular motion,

while accelerometers detect linear motion. By integrating the onboard

gyroscope and onboard accelerometer of the Arduino Nano 33 BLE

Sense to capture circular motion, we can achieve a more comprehen-

sive reading of vibration motion. In this letter, vibration data, including

Corresponding author: Mohd Ridzuan Ahmad (e-mail: mdridzuan@utm.my).

Associate Editor: G. Langfelder.

Digital Object Identifier 10.1109/LSENS.2024.3463977

both the accelerometer readings and gyroscope readings, have been

collected to train our proposed model. There is a bottleneck layer

in AEs that extracts the most salient features of the raw input data.

Increasing the number of inputs can let the bottleneck layer extract

more salient features.

Abbasi et al. [10] explored a machine-driven design exploration

approach that leverages both human experience and knowledge of

machines to produce a highly compact deep convolutional AE architec-

ture, OutlierNets. Purohit et al. used the MIMII datasets [11], focusing

on the slider and fan machines for model evaluation. The OutlierNets

achieved an 83% area under the curve (AUC) for the fan machine and

88.8% AUC for the slider machine. Lord et al. [12] used both AE and

variational autoencoder (VAE) to detect point anomalies of a washing

machine in an unbalanced dry cycle. The author’s embedded platform

is Arduino Nano 33 BLE, mounted on the washing machine. Data

collection is via the onboard accelerometer of the Arduino Nano 33

BLE. The author used an unsupervised learning method to train AE and

VAE, and the neural network architecture has only one hidden layer.

When comparing the AE and VAE, AE achieved 92% accuracy, 90%

precision, and 99% recall. However, VAE only achieved 66% accuracy,

74% precision, and 80% recall. Mostafavi and Sadighi [13] developed

a novel high-performance and precise anomaly detection framework

based on edge computing technology for real-time health monitoring

of industrial assets. The author achieved over 99.9% precision, recall,

accuracy, and F1 score. In 2021, Andrade et al. [14] introduced an

unsupervised TinyML method for pavement anomaly detection via

the microcontroller Arduino Nano 33 IoT. The author utilized the

typicality and eccentricity data analytics algorithm [15] to detect an

anomaly. The author achieved a recall classifier of 69%, and the F1

score is 82%.

Our contribution to this article includes the following.

1) We propose a new model to improve the existing normal AE

architecture.

2) We deploy the model into the microcontroller and collecting

inferencing results.

See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: MCKV Institute of Engineering. Downloaded on April 15,2026 at 05:55:48 UTC from IEEE Xplore. Restrictions apply.

2504104 VOL. 8, NO. 10, OCTOBER 2024

Fig. 1. Architecture of the standard AE model.

Fig. 2. Architecture of the MOA model.

3) Our proposed modified overcomplete asymmetric autoencoder

(MOA) is only 17 kB, which can be deployed in many resource-

constrained microcontrollers.

II. PROPOSED SOLUTION

In this letter, we propose a new architecture to achieve high accuracy

in anomaly detection. There are several machine learning approaches

in anomaly detection: supervised, unsupervised, and semisupervised.

In this letter, unsupervised learning was used since anomaly rarely

occurs, and the collection of anomaly datasets is complex. In nor-

mal conditions, the USB fan operates with intact blades, resulting

in consistent airflow and stable performance. In contrast, abnormal

conditions occur when the USB fan runs with a broken blade, leading to

uneven airflow and potential vibrations or instability. This setup helps

to differentiate between the fan’s normal and abnormal operational

states based on the condition of the blades.

The proposed MOA architecture is an improved version of an

AE. An overcomplete AE indicates that the number of nodes in the

bottleneck layer is larger than the number of nodes in the input layer.

This will make the model simply copy the features to the decoder.

Thus, at the decoder layer, the first decoder layer consists of three

nodes. The proposed MOA has six input, two hidden, and six output

layers. The accelerometer can only detect linear motion. However, the

gyroscope can detect rotational rate. Therefore, combining a gyroscope

and accelerometer can fully capture the vibration signal. The raw

normal data consist of the accelerometer (ax, ay, az) and gyroscope

(gx, gy, gz). These raw normal data will be used as the input, and the

encoder layer will compress the important features and pass them to the

bottleneck layer. Then, the decoder layer will reconstruct the original

input (ax, ay, az, gx, gy, gz) from the bottleneck layer.

Fig. 1 shows the architecture of the standard AE model while Fig. 2

shows the proposed MOA model. The standard AE only consisted of

encoder, bottleneck, and decoder layers. However, in the bottleneck

layer of the MOA model, since it is an overcomplete AE, there are

seven nodes, exceeding the six nodes in the input layer. In the decoder

part, there will be two decoders, which are decoder layer one and

decoder layer two also known as the output layer.

In our MOA architecture, we employed the Rectified Linear Unit

(ReLU) activation function in both the bottleneck layer and the initial

Fig. 3. Training curve of MOA.

decoder layer for the sake of simplicity and fast computation compared

to other activation functions [12]. Sigmoid activation was selected for

the output layer to ensure that the output is between 0 and 1. The

Adam optimizer was selected for its ability to adaptively adjust the

learning rates for various parameters. The mean absolute error (MAE)

is preferred because it assigns equal weight to all errors regardless of

their magnitude, thereby enhancing the robustness against outliers and

noise compared to the mean-squared error.

Fig. 3 shows the training curve of MOA during training in Google

Collab. The blue line indicates the training loss, and the orange curve

indicates the validation loss. Based on the training curve, both the

validation loss and training loss began to decrease but remained un-

changed from epoch 7 onward, indicating that the model had reached

a plateau in learning. Several steps were introduced to improve the

model’s generalization, including early stopping and L2 regularization.

Early stopping has been implemented to speed up the convergence

speed and to prevent underfitting or overfitting [16]. Overfitting is

caused when t here is a decrease in training loss but an increase in

validation loss. When overfitting occurs, the model will learn the

training data too well, causing the model not to generalize well [17].

L2 regularization has been implemented to prevent the model from

overfitting. L2 regularization is the introduction of penalty terms to

the model to prevent it from overfitting. Underfitting generally occurs

when both training loss and validation loss are high or increasing. It

can also be a concern if the training loss decreases but early stopping

is applied, potentially preventing the model from capturing important

data features [18]. Based on the training curve of MOA, the training

loss is very low (less than 0.06), which suggests that the MOA has

learned the data well. Therefore, implementing early stopping will not

lead to underfitting.

The training halted at epoch 7 upon reaching optimal weight.

Besides that, the higher validation loss compared to training loss is

attributed to the training being performed exclusively on normal data,

while validation includes a combination of normal and abnormal data,

thereby causing the validation loss higher than the training loss.

III. METHODOLOGY

A USB fan is used for anomaly detection, abnormal condition is

based on the condition of the fan blade. T he experimental setup is

shown in Fig. 4. The USB fan is a fixed fan that cannot rotate and is

supported by two L-brackets. There is another L-bracket used to fix the

position of the micro-USB cable that is used for data transmission from

the microcontroller to the PC. Fig. 5 shows two different conditions

Authorized licensed use limited to: MCKV Institute of Engineering. Downloaded on April 15,2026 at 05:55:48 UTC from IEEE Xplore. Restrictions apply.

VOL. 8, NO. 10, OCTOBER 2024 2504104

Fig. 4. Front view of the USB fan (left) and back view of the USB fan

(right).

Fig. 5. Normal condition of the fan blade (left) and abnormal condition

of the fan blade (right).

of fan blades. On the left-hand side is the normal condition of the fan

blade while on the right-hand side is the abnormal condition of the

fan blade. A total of 30 000 normal datasets (15 000 for speed 1 and

15 000 for speed 2) have been collected. Only 3000 sample datasets

(1500 for speed 1 and 1500 for speed 2) have been collected for the

abnormal conditions. The raw input data were collected via the onboard

accelerometer and gyroscope of the microcontroller, Arduino Nano 33

BLE Sense. The sampling time of the accelerometer and gyroscope is

16 ms, whereas the inference time is set at 100 ms.

The proposed model MOA was developed using Python language,

TensorFlow, and Tensorflow Lite. The model was trained with unsuper-

vised machine learning; only normal datasets were used in the training,

and the validation consisted of a mixture of normal and abnormal

datasets. Train and test split is based on 80% and 20%.

In total, 80% of normal data will be used for training, and the

remaining 20% will consist of a mixture of normal and abnormal

datasets for testing. The collection of normal and abnormal datasets

was conducted in a controlled environment free from interference,

to ensure a clean dataset. The threshold value is calculated based on

the mean and standard deviation of the training loss. After training

the model, the model is uploaded into the microcontroller to make

inferences.

First, the microcontroller will read the accelerometer and gyroscope

sensor data and preprocess using min–max normalization. Then, the

model will calculate the MAE between the reconstruction output and

the input value

MAE =



i = 1

− x

(1)

Equation (1) shows the formula to calculate the MAE, where y

the predicted value, x

is the true value, and n is the total number of data

points. When the MAE value exceeds the threshold value, it indicates

an anomaly. When an anomaly is detected, a red LED will light up

substituting the normal-condition green LED.

During inferencing, the model is first uploaded to the microcon-

troller, Arduino Nano 33 BLE Sense, then tested with normal condi-

tions for both speeds and then tested for abnormal conditions for both

speeds. These steps are then repeated with another model to obtain the

comparative inferencing results.

IV. RESULTS

Several matrix evaluations have been used to evaluate the proposed

model, such as accuracy, recall, precision, F1 score, and false positive

rate (FPR). Equations (2)–(6) show the formula for calculating the

accuracy, recall, precision, F1 score, and FPR, where TP is the true

positive, TN is the true negative, FP is the false positive, and FN

is the false negative. The inference results, obtained from deploying

the model on the Arduino Nano 33 BLE microcontroller, consist of

6000 data points evenly distributed across different conditions: 1500

data points each for normal and abnormal scenarios at both speed 1

and speed 2. Accuracy is calculated based on the number of correct

predictions over the size of datasets. Besides, recall is also known as

sensitivity, which measures the number of correctly predicted positive

overall positive cases in the dataset. Precision shows the number of

positive predictions made correctly. F1 score performs better in an

imbalanced dataset [19]. FPR indicates the rate of normal instances

that were detected as anomalies. The standard AE achieved a TP of

2960, FP of 40, TN of 2977, and FN of 23. While the MOA achieved

a TP of 2963, FP of 37, TN of 2991, and FN of 9

Accuracy =

TP + TN

TP + FP + TN + FN

× 100% (2)

Recall =

TP + FN

× 100% (3)

Precision =

TP + FP

× 100% (4)

F1 score = 2 ×

Precision × Recall

Precision + Recall

× 100% (5)

FPR =

FP + TN

× 100%. (6)

For the standard AE, the threshold was selected as 0.107515 and

achieved an accuracy of 98.95%, recall of 99.23%, precision of

98.67%, F1 score of 98.95%, and FPR of 1.326%. The threshold of our

proposed model MOA is 0.107627 and has achieved an accuracy of

99.23%, recall of 99.70%, precision of 98.77%, F1 score of 99.23%,

and FPR of 1.222%.

V. DISCUSSION

Our approach employs unsupervised machine learning with training

conducted on an imbalanced dataset. Our proposed MOA model

outperforms the existing AE architecture as indicated in [12],where

the AE model achieved 92% accuracy, 90% precision, and 99% recall

using the AE model. By introducing additional input, such as the

gyroscope data, our proposed model MOA achieved an accuracy of

99.23%, recall of 99.70%, precision of 98.77%, F1 score of 99.23%,

and FPR of 1.222%. We also achieved an accuracy of 98.95%, recall

of 99.23%, precision of 98.67%, F1 score of 98.95%, and FPR of

1.326% with the standard AE. While the improvements in our MOA

model compared to the standard AE are modest, our analysis concludes

that MOA exhibits superior performance across all metrics, including

higher accuracy, higher recall, higher precision, higher F1 score, and

lower FPR. A lower FPR indicates improved accuracy and reduced

errors.

Our article demonstrated that despite the difference in sampling

time (16 ms sampling time of accelerometer and gyroscope) and the

inferencing time (100 ms), the model can still perform effectively. The

MOA AE size is 17 kB, larger than the standard AE model of 13 kB,

which contributes to improved metric scores.

Authorized licensed use limited to: MCKV Institute of Engineering. Downloaded on April 15,2026 at 05:55:48 UTC from IEEE Xplore. Restrictions apply.

/ 4

196

1687587583.4 Unit 1 Biochemistry Of Lipids

3 Parallel and Perpendicular Lines

A Guide to Eating After Gallbladder Surgery

A preservação da biodiversidade e o manejo sustentável dos ecossistemas

FAQs of Modified Overcomplete Autoencoder for Anomaly Detection

What is the purpose of the Modified Overcomplete Autoencoder?

The Modified Overcomplete Autoencoder (MOA) is designed to enhance anomaly detection in embedded systems, specifically targeting applications in TinyML. It focuses on identifying operational anomalies in devices like USB fans, where issues such as broken blades can lead to inconsistent performance. By leveraging both accelerometer and gyroscope data, the MOA effectively captures the necessary features to distinguish between normal and abnormal conditions.

How does the MOA model achieve high accuracy in anomaly detection?

The MOA model achieves high accuracy by employing an architecture that integrates both accelerometer and gyroscope data, allowing for a comprehensive analysis of vibration signals. The model is trained using unsupervised learning, which is particularly effective in scenarios where labeled anomaly data is scarce. With a reported accuracy of 99.23% and a recall of 99.70%, the MOA demonstrates its capability to accurately identify anomalies while maintaining a low false positive rate.

What are the key features of the MOA architecture?

The MOA architecture features an overcomplete design, where the bottleneck layer contains more nodes than the input layer, allowing for better feature extraction. It consists of six input nodes, two hidden layers, and six output nodes, utilizing Rectified Linear Unit (ReLU) activation functions for efficiency. The model's small size of 17 kB makes it suitable for deployment on resource-constrained microcontrollers, which is essential for real-time applications in the Internet of Things.

What data was used to train the MOA model?

The MOA model was trained using a dataset that included 30,000 normal data samples collected from a USB fan operating under two different speeds. Additionally, 3,000 abnormal data samples were generated by simulating conditions such as a broken fan blade. This combination of normal and abnormal data allows the model to learn effectively and distinguish between typical and atypical operational states.

What are the implications of this research for IoT applications?

This research has significant implications for IoT applications, particularly in the realm of real-time anomaly detection. The ability to deploy the MOA model on low-power microcontrollers enables efficient monitoring of various devices, enhancing operational reliability. As industries increasingly adopt IoT solutions, the MOA's high accuracy and low resource requirements position it as a valuable tool for ensuring the health and performance of embedded systems.

Modified Overcomplete Autoencoder for Anomaly Detection

Key Points

100 Examples Of Prepositions

1687587583.4 Unit 1 Biochemistry Of Lipids

3 Parallel and Perpendicular Lines

A Guide to Eating After Gallbladder Surgery

A preservação da biodiversidade e o manejo sustentável dos ecossistemas

ACLS Pretest Answers 2024

FAQs of Modified Overcomplete Autoencoder for Anomaly Detection

Related of Modified Overcomplete Autoencoder for Anomaly Detection

IoT-Based Air Quality Monitoring System Research Paper

TinyML-Based Intrusion Detection System for In-Vehicle Networks

Myco Brick Making Process Infographic

Newton-Cotes Numerical Integration Methods Overview

Answer Key for Food Chains and Ecosystems Age 6-11

AP Physics C Practice Workbook Book 2 Electricity and Magnetism