Systems that employ several sensors increasingly use multi-sensor fusion to process the data before making decisions. Among the more frequently discussed applications are object recognition (in automotive and robotics), medical diagnosis, speech recognition, security and surveillance, and more.
Two common design approaches for multi-modal fusion include early fusion and late fusion. In early (or low-level) fusion, or raw data fusion, the raw data from different sensors is combined before any high-level processing or decision-making. The fused data is then used as input to a machine-learning model.
Alternatively, in late (or high-level) fusion, also known as object-level fusion, the data of each sensor is processed independently to make a local prediction. These individual results are combined at a higher level to make the final fused prediction. Both early and late fusion have advantages and disadvantages and are widely used across various applications. Table 1 shows a summary of the differences between early and late fusion.

Early fusion is used for object recognition tasks where data from multiple sensors (e.g., cameras, LiDAR) is fused to improve the detection accuracy in autonomous vehicles. Medical diagnosis in healthcare and multi-modal sentiment analysis that combines text, audio, and visual data are additional applications of early fusion.
Late fusion is commonly used in recommendation systems where separate models predict user preferences based on different sources of information (e.g., user behavior, item features) before aggregating the results. Speech recognition and security and surveillance systems provide additional applications for late fusion.
Implementing sensor fusion
Over 20 years ago, to improve the safety of vulnerable road users (VRUs), including pedestrians and cyclists, the European Commission funded a research project called Sensors and System Architecture for Vulnerable Road Users (SAVE-U). Researchers investigated multilevel and different sensor technologies in a sensing architecture that included low-level and high-level data fusion. Sensing technologies included infrared (IR) vision, color visible vision, and four or five 24-GHz radar sensors. At that time, they concluded that high-level data fusion was insufficient to provide the required quality and reliability of the target data.

Today, with LiDAR included as one of the potential sensing technologies, low-level fusion still seems to be a popular design approach. Lower-level/early fusion allows ADAS to use lower-cost sensors without requiring high-performance computing, keeping the sensor’s power budget down.

Luminar, Tesla, and others are examples of companies that have implemented early fusion. Luminar’s LiDAR technology is standard on Volvo Cars EX90, and its Halo design uses early fusion. In 2021, Tesla presented an end-to-end early fusion approach. Carmaker Rivian and ADAS software developer LeddarTech are also interested in early fusion.
Some industry experts are considering a new approach called very early fusion. In these designs, tuning sensors work together to reduce data volume close to the sensor, so sensors capture the environment based on each other’s capability. While camera outputs account for a large volume of sensor data, much of that data is irrelevant. With LiDAR data considered, unnecessary camera data can be eliminated before data processing.
References
ADAS sensor fusion
Understanding Smart Sensors, Third Edition, Chapter 4: Sensor Fusion
Late vs early sensor fusion: a comparison
Early Fusion vs. Late Fusion in Multimodal Data Processing – GeeksforGeeks