Flying Vehicle Detection under Complex Conditions: A Large-scale Dataset and Benchmark Approach
Submitted to Elsevier
Paper
Abstract
Detecting flying vehicles in complex scenes presents a significant challenge in the air transportation system. Existing research primarily focuses on flying vehicle detection using RGB imagery, limiting its effectiveness in real-life applications, particularly under challenging conditions such as low-light environments and cluttered backgrounds. A promising approach is the integration of RGB and thermal infrared (RGBT) images, providing complementary information and have shown potential in various computer vision tasks. However, the progress in RGBT flying vehicle detection is impeded by the lack of a large-scale dataset and a comprehensive benchmark for evaluation. To address this research gap, we contribute by introducing a novel RGBT image dataset called FT55k, which encompasses diverse scenarios and consists of over 55k spatially aligned RGBT image pairs with meticulously annotated ground truth, enabling comprehensive evaluation and exploration of algorithmic robustness. We propose a series of baseline approaches that can be deployed on devices with varying computing capabilities, providing a solid foundation for further research. Extensive experiments were conducted on the FT55k dataset and seven challenging public datasets, demonstrating the superiority of our proposed approaches compared to state-of-the-art methods. This work presents the first comprehen- sive benchmark for multiple types of flying vehicle detection methods across multiple scenarios. We also make corrections to the existing datasets and establish a new benchmark. Our method achieves the lowest computational complexity with a consumption of only 0.49 BFLOPs. To the best of our knowledge, this is the first flying vehicle detection method with a computational complexity below 0.5 BFLOPs. Moreover, we validate the proposed approaches through the deployment experiments on edge-computing devices, confirming their reliability and feasibility. Our codes and dataset are made publicly available at here .

Our methods provide a more accurate prediction of flying vehicles in various challenging situations, i.e., low lighting conditions (second column), tangled jungle (fourth column), and tiny vehicles (all columns). The red numbers in the figure represent confidence scores. Please zoom in for the best view.
The Dataset of
FT55k
FT55k encompasses diverse scenarios and consists of over 55k spatially aligned RGBT image pairs with meticulously annotated ground truth, enabling comprehensive evaluation and exploration of algorithmic robustness.

The Architecture of
FTNet-h
Our FTNet-h framework for accurate flying vehicle detection experiences five downsampling stages, followed by spatial attention to filter out redundant information. The SPP (Spatial Pyramid Pooling) module is employed to expand the receptive field. Additionally, PANet-like operations are applied to reduce the semantic gap between high-level and low-level semantic information. Multiple scale detection heads are utilized to generate the final output. The dx module is used during the downsampling process, where the d1, d2, d3 and d4 stages each employ one dx module. The d5 and C stages consist of a convolutional layer with two ResNet modules, while the r1 stage contains two ResNet modules. Here, ResNet×m in the dx module indicates the presence of 𝑚 ResNet modules. The d1, d2, d3, and d4 stages contain 2, 8, 8, and 8 ResNet modules, respectively.

Data
Distributions
(a) Target positional distribution on the three sub-dataset. Box positions are mostly concentrated in the central area of the picture (b) Statistical data of the target size in our dataset. The dark blue points correspond to the vehicle images whose width and height are less than 5% of the image size. The blue points represent the data that the sizes are less than 10% of the image size. The remaining red points are samples greater than 10%. Since the attitude of the camera may frequently change during the moving process, there are various height-width ratios of bounding boxes.

Qualitative
visualization
Our methods provide a more accurate prediction of flying vehicles in various challenging situations, i.e., low lighting conditions (second column), tangled jungle (fourth column), and tiny vehicles (all columns). The red numbers in the figure represent confidence scores. Please zoom in for the best view.

Paper
Citation
@article{,
title={Flying Vehicle Detection under Complex Conditions: A Large-scale Dataset and Benchmark Approach},
author={Xunkuai Zhou, Yijun Huang, Li Li, Jie Huang and Ben M. Chen},
journal={Expert Systems with Applications},
year={2024},
publisher={Elsevier}
}
Related
Projects
-
VDTNet: A High-Performance Visual Network for Detecting and Tracking of Intruding Drones
2023 IEEE/RSJ International Conference on Intelligent Robots and Systems , 2023 (IROS)
[ResearchGate]
