Journal Search Engine
Search Advanced Search Adode Reader(link)
Download PDF Export Citaion korean bibliography PMC previewer
ISSN : 2005-0461(Print)
ISSN : 2287-7975(Online)
Journal of Society of Korea Industrial and Systems Engineering Vol.48 No.2 pp.163-177
DOI : https://doi.org/10.11627/jksie.2025.48.2.163

Mobility-Aware Multi-sensor UAV Health Monitoring via Vision and Vibration Fusion

Berdibayev Yergali, Heon Gyu Lee†
*AI & BigData Department, R&D Center, Gaion Ltd.
Corresponding Author : hglee70s@gmail.com
29/05/2025 16/06/2025 18/06/2025

Abstract


Ensuring operational safety and reliability in Unmanned Aerial Vehicles (UAVs) necessitates advanced onboard fault detection. This paper presents a novel, mobility-aware multi-sensor health monitoring framework, uniquely fusing visual (camera) and vibration (IMU) data for enhanced near real-time inference of rotor and structural faults. Our approach is tailored for resource-constrained flight controllers (e.g., Pixhawk) without auxiliary hardware, utilizing standard flight logs. Validated on a 40 kg-class UAV with induced rotor damage (10% blade loss) over 100+ minutes of flight, the system demonstrated strong performance: a Multi-Layer Perceptron (MLP) achieved an RMSE of 0.1414 and R² of 0.92 for rotor imbalance, while a Convolutional Neural Network (CNN) detected visual anomalies. Significantly, incorporating UAV mobility context reduced false positives by over 30%. This work demonstrates a practical pathway to deploying sophisticated, lightweight diagnostic models on standard UAV hardware, supporting real-time onboard fault inference and paving the way for more autonomous and resilient health-aware aerial systems.



비전 및 진동 융합을 통한 이동성-인식 다중센서 무인 항공기 상태 모니터링

볘르드바에브 예르갈리, 이헌규†
*㈜가이온 AI & 드론 연구소

초록


    1. Introduction

    Unmanned Aerial Vehicles (UAVs) have become indispensable tools across a diverse array of civilian, industrial, and military applications. We see them in infrastructure inspection, logistics, precision agriculture, surveillance, and disaster response, among other fields. As UAVs undertake increasingly mission-critical and autonomous roles, ensuring their operational safety, reliability, and resilience is paramount. A key requirement for achieving this high level of dependability is real-time, onboard health monitoring, particularly during flight. Component failures, such as rotor imbalances or structural degradation, can quickly escalate, leading to mission failure or even catastrophic crashes.

    Traditionally, UAV fault detection systems often relied on model-based diagnostics. These methods require precise physics- based representations of UAV dynamics and failure modes. While effective in controlled laboratory settings, such approaches often fall short in real-world environments. Rapidly changing aerodynamic loads, uncertain operating conditions, and inherent structural variability make accurate modeling a formidable challenge.

    In response to these limitations, recent research has shifted towards data-driven approaches, particularly those employing sensor data and machine learning techniques. These methods have shown considerable promise in capturing complex fault signatures without needing explicit system models. Typically, these systems analyze anomalies in inertial measurements, motor vibrations, and control telemetry to identify deviations from expected behavior. However, many existing solutions remain single modality, relying solely on IMU data or power signals, for example. This can limit their sensitivity, making it difficult to detect subtle or emerging faults. Furthermore, the computational overhead of some solutions often hinders their deployment, making real-time execution on resource-constrained embedded platforms challenging.

    Another critical limitation in many current systems is the lack of mobility awareness. Most systems do not adequately account for the UAV's dynamic maneuvers, external disturbances like wind gusts, or varying mission contexts, all of which significantly influence sensor readings. Consequently, these systems may produce an unacceptable rate of false positives or overlook context-dependent anomalies, thereby eroding trust and utility.

    To address these pressing challenges, this paper proposes a novel mobility-aware multi-sensor health monitoring framework. Our system thoughtfully fuses vibration signals from MEMS-based accelerometers with visual anomaly cues derived from onboard camera feedback. Crucially, it is designed for real-time execution on widely used UAV flight controllers (e.g., Pixhawk) without relying on external processors or specialized auxiliary sensors.

    The main contributions of this work are:

    • A multi-sensor UAV health monitoring framework that integrates vibration and vision data for improved detection of rotor and structural faults.

    • A systematic data preprocessing pipeline for parsing and analyzing flight controller (FC) logs. This includes statistical filtering of 64 initial message fields to extract a refined set of high-impact features.

    • Lightweight, deployable machine learning models optimized for efficient onboard execution. This includes a Multi-Layer Perceptron (MLP) for rotor imbalance detection and a CNN-based classifier for vision-driven anomaly detection.

    • Real-world validation through over 100 minutes of flight testing using a 40 kg-class multirotor UAV with deliberately induced rotor damage (10% blade loss), evaluated using both vibration and visual features.

    • A clear demonstration of onboard deployment feasibility, with all models operating in near real-time on resource- limited UAV hardware.

    By combining vibration and visual modalities with an awareness of the UAV's dynamic flight context, this research advances the field of autonomous UAV health monitoring. The proposed system enables improved fault localization, supports more informed fault-tolerant decision-making, and ultimately contributes to safer, more resilient UAV operations in complex, real-world environments.

    2. Related Work

    Health monitoring for Unmanned Aerial Vehicles (UAVs) has garnered significant attention in recent years, driven by the increasing demand for autonomy, safety, and reliability in mission-critical applications. Existing approaches generally fall into three main categories: vibration-based, vision-based, and multi-sensor fusion methods. While each offers valuable tools for fault detection and diagnostics, they often involve trade-offs concerning computational cost, robustness under varying conditions, and their practical deployability on embedded platforms.

    2.1 Vibration-Based Fault Detection

    Recent research by Hale [4] delves into the challenges of distinguishing between environmental and mechanical anomalies in UAVs. The study emphasizes the importance of identifying wind-induced vibrations to prevent misclassification of external disturbances as mechanical faults. By employing frequency analysis techniques, the research aids in accurately diagnosing the root causes of vibration anomalies in UAV operations.

    Vibration analysis is among the most widely used methods for UAV health monitoring, particularly effective for identifying rotor imbalance, motor faults, and structural degradation. Abdullah et al. [1] demonstrated that Multi-Layer Perceptrons (MLPs) trained on time-domain vibration features could reliably detect rotor imbalance in quadrotors using only MEMS accelerometer data. Their lightweight model achieved a root mean square error (RMSE) of 0.14 and R² ≈ 0.92, confirming its suitability for real-time onboard deployment.

    Furthermore, Baldini et al. [2] proposed a frequency-domain fault detection approach using IMU-only data, employing Finite Impulse Response (FIR) filters and sparse classifiers such as linear SVMs. Their system, implemented directly on Pixhawk Cube flight controllers, achieved over 98% accuracy while consuming less than 2% of CPU cycle time - marking it as one of the few real-time, embedded solutions demonstrated in practice.

    While promising, these methods rely heavily on handcrafted features or narrowly scoped sensor inputs (e.g., central IMU only) and typically assume relatively stable flight conditions. This limits their effectiveness under real-world scenarios involving dynamic maneuvers, turbulent airflow, or payload variability - all of which significantly affect vibration signatures. These gaps underscore a key limitation: the absence of mobility awareness in current vibration-based diagnostic systems.

    Similarly, Kim et al. [6] demonstrated prognostic frameworks for rotary systems in other domains, such as logistics equipment, have employed lightweight MLPs and statistical signal features for early fault prediction in rotational components. These approaches demonstrate the generalizability of vibration-based deep learning across multiple mechanical domains and motivate their adaptation to UAV rotor health diagnostics.

    2.2 Vision-Based Anomaly Detection

    Visual sensing has emerged as a complementary modality for detecting system-level anomalies during flight, especially those impacting the UAV’s motion or interaction with the environment. Camera-based systems typically analyze trajectory deviations, attitude instabilities, or horizon perturbations as proxies for actuator or structural faults.

    Jin et al. [5] introduced a Transformer-based model for anomaly detection in aerial videos, treating consecutive video frames as sequences to learn feature representations and predict subsequent frames. Their approach effectively identifies events with unpredictable temporal dynamics as anomalies, demonstrating the potential of deep learning models in capturing complex motion patterns in UAV operations.

    Other studies have leveraged optical flow shifts, horizon line jitter, or scene disorientation to detect abnormal behaviors not always reflected in IMU or motor telemetry. These methods are particularly effective in fault-tolerant or over-actuated systems, where mechanical anomalies may not cause immediate control loss. However, visual approaches are inherently sensitive to environmental conditions, including lighting changes, background clutter, and occlusions. Furthermore, processing video in real-time remains computationally demanding for onboard systems, requiring efficient inference pipelines or model compression to meet latency constraints.

    2.3 Multi-sensor Fusion for UAV Health Monitoring

    To enhance robustness and improve fault localization, recent research has turned toward multi-sensor fusion. The PADRE dataset [7], for instance, integrates synchronized accelerometer and microphone data to detect multirotor propeller faults via complementary vibration-acoustic signals. Zhou et al. [8] extended this paradigm by combining wavelet-based vibration features with stacked denoising autoencoders for IMU-based anomaly detection.

    Building upon the significance of multi-sensor data fusion, Bultmann et al. [3] propose a UAV system capable of real-time semantic inference by integrating multiple sensor modalities, including LiDAR, RGB, and thermal imaging. Their approach utilizes lightweight convolutional neural networks and embedded inference accelerators to perform online semantic segmentation and object detection. This system demonstrates the benefits of multi-sensor data fusion in enhancing UAV applications, particularly in complex environments.

    Despite promising results, these systems often lack mobility awareness. That is, they do not explicitly model or adjust for the UAV's own maneuvers, external forces, or mission context. This limits their ability to distinguish between true faults and transient anomalies caused by legitimate flight dynamics, leading to potential misclassification or reduced generalizability across diverse operational environments.

    2.4 Summary and Research Gap

    To date, no existing work has fully integrated mobility-aware reasoning, vibration-based diagnostics, and vision-based anomaly detection into a unified, lightweight architecture suitable for real-time embedded deployment. This paper addresses that gap by:

    • Combining statistically filtered flight controller (FC) log fields with both vibration and vision inputs.

    • Applying modality-specific deep learning models (MLP for vibration, CNN for vision).

    • Demonstrating real-time onboard feasibility on Pixhawk class flight controllers - without requiring auxiliary sensors, GPUs, or external processing units.

    By fusing vibration and visual signals with context-aware reasoning, this work advances UAV health monitoring toward a more autonomous, fault-resilient, and field-deployable future.

    3. System Overview

    The proposed system is a mobility-aware, multi-sensor UAV health monitoring framework that integrates vibration-based analysis and visual anomaly detection to identify rotor faults and structural anomalies during flight. It is designed for real- time execution on resource-constrained UAV platforms, emphasizing modularity, low-latency inference, and practical deployability without requiring additional hardware or sensors.

    3.1 Architecture Overview

    <Figure 1> illustrates the architecture of our proposed health monitoring system. It highlights the three main stages: Sensor Data Acquisition, where vibration (IMU), flight controller logs, and visual (camera) data are collected; Data Preprocessing & Feature Engineering, where raw data is transformed into usable features; and the Multi-sensor Inference Engine, which uses an MLP for vibration classification and a CNN for vision- based anomaly detection, with their outputs fused to produce fault indicators.

    The system comprises three primary subsystems, as depicted in <Figure 1>:

    • Sensor Data Acquisition Layer: This layer collects raw telemetry and sensor data from onboard systems. It includes:

      • ◦ Vibration signals from MEMS accelerometers, accessed via the Inertial Measurement Unit (IMU).

      • ◦ Visual input from an onboard camera.

      • ◦ Comprehensive flight controller (FC) logs covering 64 message fields (See Appendix, Table A1). These diverse data sources allow the system to capture both local mechanical cues (like vibrations) and global motion signatures (like erratic flight paths), enabling more robust, multi-sensor anomaly detection.

    • Data Preprocessing and Feature Engineering Module: This module transforms the raw, often noisy, inputs into structured and informative features suitable for machine learning models. Its key tasks include:

      • ◦ Parsing FC binary logs into aligned time-series data using a field-wise merging strategy.

      • ◦ Applying statistical filtering techniques. This involves, for example, removing fields with constant values, pruning fields with a high ratio of NaN (Not a Number) entries, and reducing correlation between features to minimize redundancy.

      • ◦ Extracting domain-specific features relevant to UAV health, such as:

        • ․Time-domain and frequency-domain vibration features (e.g., bandpass-filtered Power Spectral Density - PSD).

        • ․Indicators of visual scene instability, like horizon deviation or frame jitter.

        • ․Control mismatch metrics, for instance, the difference between desired and actual roll (DesRoll - Roll), the variance between control rate and desired control rate (CRt - DCRt), and inter-arm vibration deltas (useful for isolating rotor-specific issues).

    • Multi-sensor Inference Engine: This is the analytical core of the system, housing the machine learning models. It comprises:

      • Vibration Classifier: A lightweight Multi-Layer Perceptron (MLP) trained to detect rotor imbalance. It uses features derived from the IMU and selected FC log fields. The classifier outputs a binary fault flag (fault/no fault) and an associated confidence score.

      • Vision-Based Anomaly Detector: A CNN model designed to flag visual anomalies. These might include instability in the horizon line or trajectory drift that could indicate actuator faults not immediately apparent in inertial data alone.

      • Fusion logic: A rule-based fusion logic layer then intelligently combines the outputs from both the MLP and CNN models. This layer also incorporates flight context information (such as current pitch rate or yaw acceleration) to enhance fault localization and critically reduce false positives. This integration of sensor outputs with the dynamic flight state is central to our system's mobility awareness.

    3.2 Hardware and Deployment Scope

    The system is specifically designed for embedded UAV platforms where computational power and energy are constrained.

    <Figure 2> showcases the core hardware components. The system relies on a Pixhawk Cube Black STM32-based flight controller for high-frequency IMU logging (350 Hz). Vision processing, including the CNN-based anomaly detection, is handled by a companion computer, such as an NVIDIA Jetson Nano or Raspberry Pi 4, as the Pixhawk itself cannot process real-time video streams.

    The key hardware elements are:

    • Primary platform: A Pixhawk Cube Black, which is an STM32-based flight controller. We customized the ArduPilot firmware to enable high-frequency IMU logging at 350 Hz.

    • Vision module: This runs on a lightweight companion computer (e.g., an NVIDIA Jetson Nano or Raspberry Pi 4). This is necessary because the Pixhawk flight controller typically lacks the processing power for real-time video stream analysis.

    This distributed architecture ensures that the flight controller's responsiveness for critical flight tasks is preserved, while still enabling powerful multi-sensor inference. We carefully balanced weight and power constraints to maintain the UAV's performance and flight endurance.

    3.3 Operational Workflow

    The system's operation follows a well-defined workflow:

    • In-Flight Data Logging: Visual and vibration data are recorded during both normal and fault-induced flights (e.g., 10% propeller blade damage) to create ground truth.

    • Offline Preprocessing & Labeling: FC logs are parsed, filtered, and labeled using statistical metrics and manual annotation to create a supervised learning dataset.

    • Model Training & Validation:

      • ◦ The MLP is trained on vibration features using 5-fold cross-validation.

      • ◦ The CNN is trained on labeled visual frames distinguishing normal vs. faulty behavior.

    • Onboard Inference: During live flight, real-time data is streamed to the models for continuous fault evaluation.

    • Alert Generation: Anomalies are flagged within milliseconds, supporting immediate operator notification or autonomous fault response.

    3.4 Design Objectives

    The system is engineered with the following goals:

    • Low latency: Inference time per frame under 100 ms.

    • Low computational footprint: Quantized models optimized for edge deployment.

    • Robust fault detection: Effective across dynamic loads, wind conditions, and varying payloads.

    • Sensor efficiency: Uses only standard onboard sensors (IMU and camera); no additional hardware required.

    4. Methodology

    The proposed system integrates vibration-based fault detection and vision-based anomaly recognition within a unified, mobility-aware framework. This section outlines the training pipelines, feature extraction processes, model architecture for each sensing modality, and the rule-based fusion strategy used to synthesize final fault predictions.

    4.1 Vibration-Based Fault Detection

    The vibration analysis subsystem focuses on detecting mechanical anomalies - particularly rotor imbalance - using data from onboard inertial sensors and flight controller logs.

    4.1.1 Feature Extraction

    Input Sources: Raw flight controller (FC) logs comprising 64 message fields (See Appendix), sampled at 350 Hz.

    Filtering Strategy:

    • ◦ Fields with excessive null values, constant readings, or high correlation redundancy are removed.

    • ◦ The following high-impact indicators are retained:

      • Attitude error terms: DesRoll - Roll, DesPitch - Pitch.

      • Control mismatch: CRt - DCRt.

      • Arm-level vibration deltas: to isolate rotor-specific imbalances.

    Signal Processing:

    • ◦ Band-pass filtering isolates fault-relevant frequency bands (e.g., motor resonance, rotor harmonics).

    • ◦ Features are computed in both domains:

      • Time-domain: RMS, standard deviation, skewness.

      • Frequency-domain: FFT peak ratios, PSD slope.

      • Feature normalization is performed using Z-score scaling.

    Dimensional analysis:

    • ◦ Principal Component Analysis (PCA).

    • ◦ Uniform Manifold Approximation and Projection (UMAP)

    As illustrated in <Figure 3>, PCA projections of the extracted features show partial separation between normal (green) and abnormal (red) data across different samples. While PCA preserves global structure well, the clusters exhibit significant overlap, reflecting the method's limitations in modeling nonlinear dependencies.

    In contrast, <Figure 4> shows UMAP projections of the same feature space, where normal and abnormal groups form well-separated clusters. UMAP’s ability to capture nonlinearity results in clearer group delineation, aiding in downstream tasks such as anomaly detection or clustering.

    As illustrated in <Figure 5>, the TabNet model highlights Latitude and Altitude as the most influential features, followed by orientation-related components such as Q4, Q3, and Yaw. This reflects TabNet’s internal attention mechanisms, which prioritize spatial and attitude indicators relevant to flight dynamics.

    In contrast, <Figure 6> presents SHAP-based feature importance derived from a Random Forest classifier. SHAP values reaffirm the prominence of Latitude and Altitude, while slightly reordering other key contributors - placing Roll and Longitude above some quaternion terms. This model-agnostic explanation offers a more granular view of how individual features impact prediction, providing complementary insight into the system’s decision logic.

    <Figure 7> displays the Pearson correlation matrix for FTN1 sensor-derived features. Strong positive correlations are observed between several sensor axes, particularly among SnY, SnZ, and SnX, indicating tightly coupled sensor responses across spatial dimensions. Notably, SnY and SnZ exhibit a high correlation coefficient of 0.76, suggesting shared signal patterns or mechanical coupling.

    Negative correlations are also present; for instance, PkAvg and FrX are moderately inversely correlated (–0.60), implying that peak averages are suppressed under increased frontal load conditions. Most features, however, display weak to negligible correlation with Tc (the target class), highlighting the importance of nonlinear models and feature engineering in subsequent stages of analysis.

    4.1.2 Model Architecture and Training

    Architecture: The MLP consists of an input layer (20–40 features), 3–5 hidden layers with ReLU activation, and a binary softmax classifier output for “fault” vs. “no fault.”

    Training Setup:

    • Dataset: Over 800 labeled windows balanced between normal and fault-induced flight data (e.g., 10% propeller blade cut).

    • Window size: 1–2 seconds, 50% overlap, to capture transient anomalies.

    • Loss function: Binary cross-entropy.

    • Optimizer: Adam.

    • Validation: 5-fold cross-validation; ROC-AUC used as the primary performance metric.

    • Performance: The trained MLP achieved an RMSE of 0.1414 and R²≈0.92, with an inference time of approximately 15 ms per window (on Jetson Nano), enabling real-time detection.

    4.2 Vision-Based Anomaly Detection

    The vision-based subsystem detects high-level behavioral anomalies such as attitude drift, trajectory perturbation, and visual horizon instability, which may result from actuator or structural faults not immediately observable through inertial sensing.

    4.2.1 Input and Preprocessing

    Visual Data Source: A front-facing onboard camera capturing frames at 15–30 FPS.

    Preprocessing: Each frame undergoes resizing to 224×224 pixels and normalization of pixel intensity.

    Visual Feature Engineering: Key visual cues are extracted, including horizon line jitter and inter-frame deviation vectors. Optical flow estimation (Horn-Schunck or Lucas-Kanade methods) is optionally applied to derive motion vectors.

    4.2.2 CNN-Based Classification

    Architecture: A shallow CNN (typically 3–4 convolutional layers) is primarily used for real-time inference. For higher fidelity applications, a pretrained backbone (e.g., VGG-16 or MobileNet) can be adapted. The output is a binary classification (normal vs. anomalous).

    Training Setup:

    • Dataset: Labeled frames from both nominal and fault-induced flight footage. Labeling is based on visual inspection, aligned with FC-based fault timestamps.

    • Augmentation: Data augmentation techniques like rotation, brightness shift, and minor translations are applied to enhance model generalization.

    • Loss: Binary cross-entropy.

    • Optimizer: RMSprop.

    Performance: The CNN achieved a validation accuracy of over 90%, with an inference latency of approximately 40 ms per frame (on Jetson Nano).

    4.2.3 Mobility Context Integration

    To improve robustness during dynamic maneuvers and reduce false alarms, the system incorporates context-aware compensation using flight dynamics data.

    Dynamic Compensation: Features like pitch rate, yaw acceleration, and throttle percentage are appended to the vibration and vision model input vectors (or their outputs). These parameters help differentiate between aggressive-but-safe maneuvers and actual faults.

    Frame-Wise Filtering: A temporal logic filter suppresses alerts during expected high-movement periods (e.g., takeoff, sharp turns, landing) to prevent false positives that naturally occur during non-fault dynamic events.

    4.2.4 Multi-sensor Fusion Strategy

    The vibration and vision subsystems operate independently, and their outputs are synthesized using a rule-based hierarchical fusion logic.

    This rule-based fusion framework enables structured decision- making based on subsystem agreement. High-confidence alerts are triggered when both models detect faults, while medium-confidence alerts are raised if only one detects an issue. These are further filtered using flight dynamics (e.g., yaw rate, pitch) to suppress false positives. This logic improves robustness and aligns with onboard UAV operational requirements.

    Fusion Logic:

    • High confidence anomaly: Triggered when both subsystems independently flag a fault, indicating a highly probable critical anomaly.

    • Medium confidence (mechanical): When only the vibration module flags a fault, suggesting a mechanical issue (e.g., rotor imbalance) without overt visual cues.

    • Medium confidence (stability): When only the vision module flags a fault, indicating a potential control or surface instability. This was notably susceptible to false alarms during sharp turns, but these were effectively filtered by mobility context.

    • Normal: When neither module detects an anomaly.

    Future Extension: A learned fusion head (e.g., a small decision tree or 2-layer MLP) may be incorporated in future iterations. This would accept model outputs and mobility context as input, learn temporal dependencies or inter-modality patterns, and use ensemble confidence scoring with time-decay weighting to improve prediction robustness.

    5. Experimental Setup and Results

    This section details the hardware and flight setup used for data collection, the construction of training and evaluation datasets, and the performance of the vibration-based and vision- based subsystems. Real-time inference latency and onboard deployability are also evaluated to confirm system suitability for embedded UAV applications.

    5.1 UAV Platform and Hardware Configuration

    Experiments were conducted using a custom-built 40 kg-class multirotor UAV, equipped with a standard Pixhawk Cube Black flight controller and a companion computer for real-time vision processing.

    <Figure 8> illustrates the custom-built multirotor UAV platform used in our experiments. The airframe is a 40 kg-class quadrotor equipped with high-thrust carbon fiber propellers and reinforced landing gear to ensure operational stability during failure scenarios. The central onboard system comprises a Pixhawk Cube Black flight controller and a companion computer dedicated to real-time vision-based processing and control tasks.

    The figure highlights the experimental setup, including the designated front orientation of the UAV (blue arrow) and the simulated failure location, where a single propeller is intentionally disabled (red circle). The wind direction and velocity are indicated by the red arrow and annotated as approximately 1–3 m/s, introducing environmental variability to test fault tolerance and stability. This configuration enables repeatable and controlled evaluation of in-flight failure responses under realistic conditions.

    • Flight Controller: Pixhawk Cube Black (STM32-based) running customized ArduPilot firmware for high-frequency logging.

      • Logging rate: 350 Hz for granular data capture.

      • Data format: .bin logs containing 64 message fields.

    • Camera Module: A forward-facing HD camera (30 FPS), precisely synchronized with the flight controller's time.

    • Companion Computer: An NVIDIA Jetson Nano (4 GB RAM), responsible for running real-time inference for both the vibration MLP and vision CNN.

    • Sensors: Utilized onboard MEMS IMUs, barometers, and ESC telemetry - all part of the default ArduPilot stack.

    • Fault Induction Mechanism: For controlled experimentation, we simulated a localized mechanical fault by manually performing a 10% blade cut on one rotor.

    5.2 Flight Testing and Data Collection

    We performed extensive flight tests to generate a comprehensive dataset covering both normal and faulty conditions.

    Total Flight Time: A total of 100 minutes, equally divided between normal and fault-induced flight sessions.

    Flight Scenarios:

    • Normal: Included standard flight maneuvers such as hover, straight-line cruise, coordinated turns, and step-input maneuvers.

    • Faulty: Involved identical maneuvers with an induced propeller imbalance on Motor 1, simulating a real-world rotor fault.

    • Environment: Outdoor tests were conducted in controlled airspace under varying wind conditions to ensure real- world relevance.

    Data Logging:

    • ◦FC logs were meticulously exported and parsed using MAV Proxy and custom Python scripts.

    • ◦ Camera frames were timestamp-aligned with the corresponding FC log windows to facilitate robust fusion training.

    5.3 Dataset Construction and Labeling

    We constructed two distinct datasets for supervised learning:

    (1) Vibration Data set:

    • ◦ Source: Extracted from parsed FC log fields, initially containing 64 fields, then filtered down to approximately 25 high-impact signals for optimal feature representation.

    • ◦Windowing: 1.5-second segments with 50% overlap.

    • ◦ Size: 820 total windows (410 normal, 410 faulty), ensuring a balanced dataset.

    • ◦ Labels: Assigned based on precise knowledge of fault timing derived from flight annotations.

    (2) Visual Data set:

    • ◦ Source: Approximately 6,000 frames extracted from 20 flight sessions.

    • ◦ Labeled Set: Approximately 3,000 frames manually labeled using visual cues (e.g., horizon jitter, noticeable attitude drift).

    • ◦ Augmentation: Applied standard augmentation techniques including ±15° rotation, brightness variation of ±20%, and ±5% scale jitter to enhance model generalization.

    <Figure 9> illustrates temperature (T) measurements obtained from the IMU during flight operations, plotted across time (sample index). Two profiles are compared: normal flight data (blue) and abnormal flight data (red). The abnormal data exhibits consistently lower temperature values and sharp drops, which may be indicative of anomalies either in environmental exposure or internal sensor behavior.

    As noted in the analysis, although these temperature deviations could suggest potential faults, they may also stem from variations in ambient temperature or flight conditions, given the non-synchronized data acquisition across different days. Therefore, this observation requires cautious interpretation.

    Further investigation using datasets collected under identical temporal and environmental conditions (same day, same time window) is recommended to validate the relationship between IMU temperature readings and anomaly presence.

    5.4 Evaluation Metrics

    To rigorously assess both detection accuracy and real-time feasibility, we employed the following metrics such as <Table 1>

    <Table 1> summarizes the key metrics used to evaluate the system’s performance in terms of detection accuracy and real-time applicability. It includes classification and regression metrics, as well as system-level measures like inference latency and false positive rate.

    5.5 Results: Vibration-Based Fault Detection (MLP)

    Our MLP model demonstrated strong performance in classifying rotor imbalance events using the selected and filtered FC log features.

    <Table 2> presents the evaluation results of a multilayer perceptron (MLP) model trained on filtered flight controller (FC) log features. The model achieved high classification accuracy and low error rates for detecting rotor imbalance faults, with low inference latency suitable for real-time applications.

    Feature attribution, analyzed via SHAP (SHapley Additive exPlanations), confirmed the dominant role of signals like Desired Roll - Actual Roll (DesRoll - Roll) and arm-level vibration deltas in discriminating between normal and faulty conditions.

    5.6 Results: Vision-Based Fault Detection (CNN)

    <Table 3> summarizes the results of a convolutional neural network (CNN) model used for fault detection based on visual data. Despite variations in image conditions, the model demonstrated reliable anomaly detection performance, identifying instability patterns such as horizon jitter, with acceptable latency.

    Visualized activations confirmed that the model effectively focused on critical image regions around the UAV’s visual horizon for anomaly detection.

    5.7 Fusion-Based Fault Detection (Rule-Based Strategy)

    The integration of vibration and vision outputs via our rule-based fusion strategy significantly improved the overall robustness and reliability of fault detection.

    <Table 4> presents the outcomes of the proposed rule-based fault detection strategy under four detection scenarios involving vibration and vision subsystems. The fusion approach enhances detection confidence by cross-verifying anomalies, distinguishes mechanical faults from control/surface instabilities, and reduces false positives through contextual filtering.

    Crucially, the temporal filtering within the fusion system, enabled by our mobility context, eliminated over 80% of false positives that occurred during sharp turns or unexpected wind gusts, demonstrating the value of mobility-awareness.

    5.8 Real-Time Performance Summary

    The combined system demonstrates excellent suitability for real-time embedded applications as shown <Table 5>.

    <Table 5> summarizes key performance metrics of the proposed fault detection system, including processing latency, flight controller compatibility, and deployment stability. The system demonstrates low end-to-end latency, real-time operability within standard control loop frequencies, and stable thermal performance on embedded hardware (Jetson Nano).

    5.9 Ablation Insights

    Ablation studies provided valuable insights into the contribution of key components such as <Table 6>.

    <Table 6> presents an ablation study evaluating the contribution of individual components - mobility context, FC log fusion, and visual data augmentation―to the overall system performance. The results reveal that removing each component leads to a measurable degradation in accuracy or an increase in false positives, emphasizing their critical roles in ensuring reliable and accurate fault detection.

    These findings confirm the importance of context integration, feature engineering, and data augmentation for robust fault detection.

    6. Discussion

    The results presented in the previous section robustly demonstrate that the proposed mobility-aware, multi-sensor health monitoring system effectively detects rotor and structural faults during UAV flight. This section interprets those findings in the context of our system objectives, highlights the distinct advantages of multi-sensor fusion, assesses real-time deployability, and outlines key limitations and future research directions.

    6.1 Benefits of Multi-sensor Sensing

    One of the central contributions of this work is the strategic integration of vibration and vision modalities, whose complementary characteristics significantly enhanced overall detection accuracy and robustness.

    • The vibration-based MLP reliably detected mechanical anomalies such as rotor imbalance using filtered IMU and control signal features, validating the efficacy of data-driven vibration diagnostics.

    • The vision-based CNN successfully identified anomalies manifested in observable flight behavior - such as horizon drift or trajectory perturbations - often caused by aerodynamic instabilities or actuator faults not easily captured by typical inertial sensors alone.

    • Crucially, when combined via our rule-based fusion, the system exhibited a notable reduction in false positives - especially during dynamic maneuvers where single- modality detectors tend to misclassify benign transients.

    This cross-modal synergy confirms that sensor-level redundancy and complementary signal interpretation are essential for reliable fault detection in real-world UAV operations.

    The hierarchical fusion strategy combines both MLP and CNN outputs with mobility-aware filtering, significantly improving detection confidence and reducing false positives.

    6.2 Role of Mobility Awareness

    The explicit incorporation of mobility context - including pitch rate, yaw acceleration, and throttle input - proved vital for accurate interpretation of sensor signals.

    • The vision subsystem, without context, was prone to false alarms during aggressive but nominal flight phases (e.g., sharp turns, takeoff, or landing) as benign visual shifts were misinterpreted as anomalies.

    • Similarly, the vibration subsystem sometimes misclassified transient control inputs as mechanical faults.

    Our ablation study showed that adding mobility features reduced false positives by over 30%, underscoring the need for flight dynamics compensation in autonomous UAV health monitoring. This enhancement is crucial for distinguishing between legitimate control-induced variations and true system faults.

    6.3 Real-Time Viability and Deployment

    A key strength of the framework is its real-time performance on lightweight embedded hardware:

    • Total end-to-end latency was approximately 60 ms, including vibration, vision, and fusion stages.

    • Following offline training, both the MLP and CNN models perform online inference in real-time, with a total latency of ~60 ms. This enables integration into UAV control loops operating at 10–15 Hz, supporting practical real- time fault detection onboard.

    • Both subsystems were successfully deployed on a Jetson Nano without requiring auxiliary hardware or experiencing overheating issues.

    These results demonstrate that real-time, embedded multi- sensor fault detection is practically achievable, enabling integration into existing UAV platforms without major hardware modifications.

    6.4 Limitations

    Despite the strong performance and practical viability, several limitations remain in the current framework, providing clear avenues for future research:

    • The current rule-based fusion logic, while effective in controlled tests, may lack adaptability in unstructured environments or with other UAV types. A learned fusion model could offer better generalization.

    • The manual labeling process for the visual dataset was labor-intensive, posing a significant challenge for large-scale dataset expansion.

    • Although we considered semi-automatic annotation tools such as CVAT with the SAM model, we found that many frames still required manual correction or refinement to capture subtle flight anomalies. In future work, we plan to incorporate hybrid labeling pipelines to scale our dataset efficiently.

    • The vision subsystem showed sensitivity to extreme lighting conditions (e.g., glare, dusk), which can impact detection reliability in challenging environments.

    These limitations, however, also highlight promising avenues for future refinement and automation.

    6.5 Future Work

    Planned future extensions include:

    • Learned Fusion Models: Developing adaptive fusion mechanisms (e.g., small attention networks, neural ensembles) that can dynamically combine subsystem outputs and contextual features.

    • Vision Robustness Enhancements: Improving the generalization of the vision system through advanced techniques such as synthetic data generation, domain adaptation, and exploring the utility of multi-view camera integration.

    • Fault Prognosis: Transitioning from immediate fault detection to long-term fault prediction and prognosis by learning temporal patterns of degradation across extended flight periods.

    • Swarm-Level Health Monitoring: Scaling the framework to cooperative UAV fleets by enabling inter-UAV fault communication and coordinated anomaly response.

    6.6 Practical Implications

    The proposed system provides a robust and practical foundation for in-flight anomaly detection, facilitating autonomous recovery planning, and enabling fail-safe triggering in a wide array of professional drone operations. It is particularly relevant for:

    • Logistics UAVs operating over populated areas where safety is critical.

    • Autonomous long-endurance drones that must monitor their health without human intervention.

    • Safety-critical missions such as infrastructure inspection, agricultural surveying, and military reconnaissance, where early fault detection can prevent mission failure.

    The ability to fuse standard onboard sensor data with lightweight AI models positions this framework as a practical solution for the next generation of intelligent, resilient UAV systems.

    7. Conclusion

    In this work, we presented a mobility-aware, multi-sensor UAV health monitoring framework that uniquely fuses vibration- based fault detection with vision-based anomaly recognition to enable real-time, onboard fault diagnosis. In contrast to traditional single-modality approaches, our system leverages complementary sensing modalities and contextual flight data to robustly detect both mechanical faults (e.g., rotor imbalance) and behavioral anomalies (e.g., trajectory drift) during flight.

    Validation of the framework involved extensive real-world UAV flight tests, totaling over 100 minutes of UAV operation, including deliberately induced rotor faults. The vibration-based MLP achieved an impressive 93.7% accuracy with an RMSE of 0.1414, while the vision-based CNN attained 90.3% accuracy. Critically, the combined system maintains an end-to-end inference latency below 60 milliseconds, supporting deployment on resource-constrained embedded platforms such as the NVIDIA Jetson Nano. Our experiments also showed that integrating mobility context reduced false positives by more than 30%, reinforcing the importance of flight dynamics awareness in accurate, real-time fault detection.

    Requiring no additional hardware beyond standard onboard components, the system supports seamless and cost-effective integration into existing UAV architectures.

    This makes it particularly suitable for mission-critical applications such as aerial logistics, infrastructure inspection, and autonomous long-endurance operations, where unwavering reliability is essential.

    Looking ahead, we plan to extend the framework toward predictive fault prognosis, implement learned fusion architectures, and explore fleet-level health monitoring across multi- UAV systems. As UAV operations continue to scale in complexity and autonomy, onboard health monitoring will be a critical enabler of safe, efficient, and trustworthy aerial robotics.

    Acknowledgement

    This work was supported by the materials and components technology development(R&D) program of MOTIE (2410009753, Development of multicopter component status inspection equipment and status data collection equipment)

    Figure

    JKSIE-48-2-163_F1.gif

    Mobility-Aware Multi-sensor UAV Health Monitoring System Architecture

    JKSIE-48-2-163_F2.gif

    Hardware and Deployment Scope

    JKSIE-48-2-163_F3.gif

    PCA Approach on AHR2 Feature

    JKSIE-48-2-163_F4.gif

    UMAP approach on AHR2 feature

    JKSIE-48-2-163_F5.gif

    TabNet TOP10 Feature Importance

    JKSIE-48-2-163_F6.gif

    SHAP-based Feature Importance

    JKSIE-48-2-163_F7.gif

    Pearson Correlation Matrix for FTN1

    JKSIE-48-2-163_F8.gif

    40 kg-class multirotor UAV

    JKSIE-48-2-163_F9.gif

    IMU temperature comparison: Normal vs Abnormal flight datasets

    Table

    Evaluation Matrix for Anomaly Detection

    Performance of Vibration Based model

    Performance of Vision Based model

    Fault Detection Outcomes under Fusion-Based Strategy

    System Performance Metrics for Real-Time Fault Detection

    Ablation Study on Key System Components

    Flight Controller Logs 64 Message Fields

    Reference

    1. Abdullah Salem, B.T.S., Abdullah, M.N., Mustapha, F., Kanirai, N.S.A., and Mustapha, M., Vibration Analysis Using Multi-Layer Perceptron Neural Networks for Rotor Imbalance Detection in Quadrotor UAV, Drones, 2025, Vol.9, No.2, p. 102.
    2. Baldini, A., Felicetti, R., Ferracuti, F., Freddi, A., Iarlori, S., and Monteriù, A., Real-time propeller fault detection for multirotor drones based on vibration data analysis, Engineering Applications of Artificial Intelligence, 2023, Vol.123, p. 106343.
    3. Bultmann, S., Quenzel, J., and Behnke, S., Real-time multi-modal semantic fusion on unmanned aerial vehicles with label propagation for cross-domain adaptation, Robotics and Autonomous Systems, 2023, Vol.159, p. 104286.
    4. Hale, E., Frequency Analysis of UAV Vibration Anomalies Due to Wind (INTERN NOTES), 2023.
    5. Jin, P., Mou, L., Xia, G.S., and Zhu, X.X., Anomaly detection in aerial videos with transformers, IEEE Transactions on Geoscience and Remote Sensing, 2022, Vol.60, pp. 1-13.
    6. Kim, S.H., Yergali, B., Jo, H., Kim, K.I., and Kim, J.S., A Fault Prognostic System for the Logistics Rotational Equipment, Journal of Society of Korea Industrial and Systems Engineering, 2023, Vol.46, No.2, pp. 168-175.
    7. Puchalski, R., Ha, Q., Giernacki, W., Nguyen, H.A.D., and Nguyen, L.V., PADRE a repository for research on fault detection and isolation of unmanned aerial vehicle propellers, Journal of Intelligent & Robotic Systems, 2024, Vol.110, No.2, p. 74.
    8. Zhou, S., He, Z., Chen, X., and Chang, W., An Anomaly Detection Method for UAV Based on Wavelet Decomposition and Stacked Denoising Autoencoder, Aerospace, 2024, Vol.11, No.5, p. 393.