基于特征交互与自适应分组融合的多模态目标检测

叶志晖; 武健; 赵晓忠; 王文娟; 邵新光

基于特征交互与自适应分组融合的多模态目标检测

Multimodal Object Detection Based on Feature Interaction and Adaptive Grouping Fusion

摘要

摘要: 为提升目标检测方法在复杂场景下的检测效果，将深度学习算法与多模态信息融合技术相结合，提出了一种基于特征交互与自适应分组融合的多模态目标检测模型。模型采用红外和可见光目标图像为输入，以PP-LCNet网络为基础构建对称双支路特征提取结构，并引入特征交互模块，保证不同模态目标特征在提取过程中的信息互补；其次，设计二值化分组注意力机制，利用全局池化结合Sign函数将交互模块的输出特征以所属目标类别进行特征分组，再分别采用空间注意力机制增强各特征组中的目标信息；最后，基于分组增强后的特征，提取不同尺度下的同类特征组，通过自适应加权方式由深至浅进行多尺度融合，并根据融合后的各尺度特征实现目标预测。实验结果表明，所提方法在多模态特征交互、关键特征增强以及多尺度融合方面都有较大的提升作用，并且在复杂场景下，模型也具有更高的鲁棒性，可以更好地适用于不同场景中。

Abstract: To improve the performance of object detection methods in complex scenes, a multimodal object detection model based on feature interaction and adaptive grouping fusion is proposed by combining deep learning algorithms with multimodal information fusion technology. The model uses infrared and visible object images as inputs, constructs a symmetrical dual-branch feature extraction structure based on the PP-LCNet network, and introduces a feature interaction module to ensure complementary information between different modal object features during the extraction process. Secondly, a binary grouping attention mechanism was designed. Global pooling combined with the sign function was used to group the output features of the interaction module into their respective object categories, and spatial attention mechanisms were used to enhance the object information in each group of features. Finally, based on the group-enhanced features, similar feature groups at different scales were extracted, and multi-scale fusion was carried out through adaptive weighting from deep to shallow. Object prediction was then achieved based on the fused features at each scale. The experimental results show that the proposed method significantly improves multimodal feature interaction, key feature enhancement, and multi-scale fusion. Moreover, in complex scenarios, the model exhibits higher robustness and can be better applied to different scenarios.

HTML全文

参考文献(17)

施引文献

资源附件(0)