Name: Multi-Modal and Multi-View Object Detection Dataset (MMDOD)
Published: 2025
Keywords: Object Detection, Multi-Modal, Multi-View, Transparent Objects, Dataset

Compact Mamba Multi-View Fusion for Object Detection

Auteurs : Authors : 執筆者 : Gwendal Bernardi, Godefroy Brisebarre, Sébastien Roman, Mohsen Ardabilian, Emmanuel Dellandrea

Date : Date : 日付 : 06 Février 2026 06 February 2026 2026 年 02 月 06 日

Conférence : Conference : 会議 : Pre-print

Résumé : Abstract : 要旨 : Multi-view image analysis is a key enabler for robust perception when single viewpoints provide incomplete or ambiguous observations. This challenge is particularly pronounced in industrial inspection of transparent materials, where view-dependent optical effects, subtle surface degradations, and annotation noise significantly hinder reliable detection and severity assessment. In this work, we introduce a compact and efficient multi-view fusion architecture tailored to such constraints. Our approach combines shared-weight hierarchical encoders with selective state-space modeling to explicitly exploit cross-view and multi-scale correlations. Multi-View Mamba Blocks (MVMB) perform adaptive fusion at each feature level by coupling Mamba-based selective state-space layers with FiLM-driven cross-view conditioning, while a Global State-Space Fusion Block enforces long-range coherence across all views and resolutions. Task-specific decoding heads query the resulting global representation via cross-attention to jointly predict object localization and ordinal erasure severity. The model is trained using a unified multi-task objective that integrates geometric regression, ordinal classification, cross-view consistency, feature alignment, and sequential smoothness. Extensive experiments on a challenging multi-view glass container inspection dataset demonstrate improved robustness, consistency, and scalability compared to strong baselines. To promote reproducibility and future research, we publicly release the proposed dataset at: https://datasets.liris.cnrs.fr/mvep-version1

Mots Clés : Keywords : キーワード : Mamba, Multi-View, Image Fusion, Ordinal Class, Object Detection

👀 Lire l'article 📝 Version vulgarisée

End-To-End Multi-View Multi-Modal Detection-Driven Image Fusion: One Method to Fuse them all

Auteurs : Authors : 執筆者 : Gwendal Bernardi, Godefroy Brisebarre, Sébastien Roman, Mohsen Ardabilian, Emmanuel Dellandrea

Date : Date : 日付 : 22 Janvier 2026 22 January 2026 2026 年 01 月 22 日

Conférence : Conference : 会議 : Pre-Print

Résumé : Abstract : 要旨 : We present EDIF, an end-to-end detection-driven framework designed to unify multi-modal and multi-view image fusion within a single architecture. While most existing fusion methods address either spectral complementarity (multi-modal) or viewpoint variability (multi-view) in isolation, real-world perception systems increasingly require both. EDIF formulates fusion as an object-level alignment problem: heterogeneous images are encoded as sets of keypoints, which are matched and aggregated through a graph attention mechanism to form object-centric representations directly optimized for detection. To stabilize training across heterogeneous components, we introduce a three-stage task-driven strategy that progressively aligns keypoint extraction, object localization, and cross-sensor grouping. In addition, we release the Multi-Modal and Multi-View Object Detection Dataset (MMDOD), a new benchmark designed to study detection-driven fusion under strong modality-view dependencies. MMDOD contains over 10,000 images of transparent objects captured under four complementary modalities (visible, NIR, low-contrast, polarization shift) and six viewpoints, with detailed object-level annotations. Experiments on RGB-thermal, multi-camera, and joint multi-modal multi-view benchmarks show that EDIF achieves performance competitive with recent specialized methods, while uniquely operating within a unified framework. On MMDOD, EDIF significantly outperforms adapted multi-modal multi-view baselines, highlighting the benefits of detection-driven, object-level fusion. The proposed MMDOD dataset is publicly available at https://datasets.liris.cnrs.fr/mmdod-version1

Mots Clés : Keywords : キーワード : Image Fusion, Multi-View, Multi-Modal, Object Detection, GANN

👀 Lire l'article 📝 Version vulgarisée

A Comprehensive Survey on Image Fusion: Which Approach Fits Which Need

Auteurs : Authors : 執筆者 : Gwendal Bernardi, Godefroy Brisebarre, Sébastien Roman, Mohsen Ardabilian, Emmanuel Dellandrea

Date : Date : 日付 : 22 Mars 2025 22 March 2025 2025 年 03 月 22 日

Journal : Journal : 科学雑誌 : Information Fusion

Résumé : Abstract : 要旨 : Image fusion is a crucial domain within computer vision, focusing on integrating elements from multiple images to extract complementary information while eliminating redundancy. Once the relevant features are identified, they are combined to achieve specific application goals. The field of image fusion encompasses several categories, including multi-focus, multi-exposure, multi-modal, and multi-view fusion. Most state-of-the-art solutions focus on optimizing methods to address a specific fusion category (e.g., multi-view, multi-modal, multi-exposure, or multi-focus). However, some use cases require universal methods that can handle all these challenges. The purpose of this review is to provide an in-depth and detailed analysis of various image fusion categories to thoroughly understand these domains. Additionally, this survey aims to integrate multi-view image fusion methods into a comprehensive overview of image fusion, which is not commonly addressed in the existing literature. The goal is to highlight multi-category methods that can tackle image fusion problems involving images from different types of fusion categories. Finally, potential directions for advancing this category of methods will be proposed, alongside the various challenges that this field faces. This survey examines each image fusion category to gain a better understanding of the issues related to multi-category methods. It contributes to the field of image fusion and offers researchers valuable insights into developing more effective multi-category solutions.

Mots Clés : Keywords : キーワード : Image Fusion, Multi-View, Multi-Modal, Task-Driven, Fusion Category

👀 Lire l'article

Image Fusion Survey: A Novel Taxonomy Integrating Transformer and Recent Approaches

Auteurs : Authors : 執筆者 : Gwendal Bernardi, David Strubel, Godefroy Brisebarre, Jean-François Garin, Mohsen Ardabilian, Emmanuel Dellandrea

Date : Date : 日付 : 1 Décembre 2024 1 December 2024 2024 年 12 月 1 日

Conférence : Conference : 会議 : ICPR 2024, workshop MCMI

Résumé : Abstract : 要旨 : Research progress in multi-modal information fusion, particularly in Image Fusion, has experienced significant advancements over the last decade. By integrating information from multiple sources or modalities, image fusion enables the extraction of comprehensive insights and facilitates more accurate analysis and decision-making processes. The inherent complexity of image fusion, stemming from its unstructured nature, necessitates high levels of abstraction and intricate data representation. The utilization of deep learning, notably CNN and more recently introduced Vision Transformer, has yielded substantial enhancements in image fusion methodologies. This paper presents a comprehensive survey of image fusion methodologies, focusing on recent advancements and introducing a novel taxonomy based on supervised, unsupervised, and task-driven approaches. The survey encompasses recent contributions, including the integration of transformer architectures, which have emerged as powerful tools for image fusion tasks. This classification is supported by a distinction of methods by architecture type (CNN, GAN, Transformer) for a better understanding of the relationships between methods. Through the synthesis of existing literature and the introduction of a new classification paradigm, this survey aims to provide researchers and practitioners with a comprehensive overview of image fusion techniques and guide future research directions in this rapidly evolving field.

Mots Clés : Keywords : キーワード : Image Fusion, Multi-Modal, Task-Driven, Fusion Transformer

👀 Lire l'article

Procede et dispositif pour inspecter des recipients selon au moins deux directions d'observation differentes en vue de classer les recipients

Inventeurs : Inventors : 発明家 : Gwendal Bernardi, Sylvain Gourgeon, Jean-François Garin

Date : Date : 日付 : 02 Janvier 2025 02 January 2025 2025 年 01 月 02 日

Numéro de Brevet : Patent Number : 特許番号 : WO-2025003618A1

Résumé : Abstract : 要旨 : Procédé et dispositif pour inspecter des récipients selon au moins deux directions d'observation différentes en vue de classer les récipients Procédé d'inspection de récipients en matériau transparent ou translucide (2) en vue de classer un récipient, le procédé comportant; une phase d'utilisation comprenant : - l'acquisition pour chaque récipient, d'au moins une première et une deuxième images (Ic) d'au moins une même portion d'un récipient selon deux directions d'observation différentes et selon au moins une modalité; - la fourniture en entrée d'un modèle d'apprentissage profond (NN), pour chaque récipient, d'un enregistrement des au moins la première et la deuxième images d'au moins une portion du récipient selon au moins une modalité et selon deux directions d'observation différentes; - et l'analyse par le modèle d'apprentissage profond, pour chaque récipient, de cet enregistrement pour déterminer l'appartenance de cette portion de récipient, à une classe résultat parmi une liste de classes.

📜 Voir le brevet

📄 PublicationsPublications出版物

Compact Mamba Multi-View Fusion for Object Detection

End-To-End Multi-View Multi-Modal Detection-Driven Image Fusion: One Method to Fuse them all

A Comprehensive Survey on Image Fusion: Which Approach Fits Which Need

Image Fusion Survey: A Novel Taxonomy Integrating Transformer and Recent Approaches

🏅 BrevetsPatents特許

Procede et dispositif pour inspecter des recipients selon au moins deux directions d'observation differentes en vue de classer les recipients

📄 PublicationsPublications出版物

Compact Mamba Multi-View Fusion for Object Detection Compact Mamba Multi-View Multi-Modal Fusion for Object Detection 物体検出のためのコンパクトMambaマルチビューマルチモーダル融合

A Comprehensive Survey on Image Fusion: Which Approach Fits Which Need A Comprehensive Survey on Image Fusion: Which Approach Fits Which Need 画像融合に関する包括的調査：どのアプローチがどのニーズに適合するか

Image Fusion Survey: A Novel Taxonomy Integrating Transformer and Recent Approaches Image Fusion Survey: A Novel Taxonomy Integrating Transformer and Recent Approaches 画像融合調査: Transformerと最近のアプローチを統合する新しい分類法

🏅 BrevetsPatents特許

Compact Mamba Multi-View Fusion for Object Detection

A Comprehensive Survey on Image Fusion: Which Approach Fits Which Need

Image Fusion Survey: A Novel Taxonomy Integrating Transformer and Recent Approaches