SUN ET AL.: ON-ROAD VEHICLE DETECTION: A REVIEW
employing a tracking mechanism to hypothesize the location of vehicles in future frames. Tracking takes advantage of the fact that it is very unlikely for a vehicle to show up only in one frame. Therefore, vehicle location can be hypothesized using past history and a prediction mechanism. When tracking performance drops, common hypothesis generation techni- ques can be deployed to maintain performance levels.
By examining the reported vehicle detection and tracking algorithms/systems at the structural level, many similarities can be found. Specifically, the majority of existing on-road vehicle detection and tracking systems use a detect-then-track approach (i.e., vehicles are first detected and then turned over to the tracker). This approach aims to resolve detection and tracking sequentially and separately. There are many exam- ples in the literature following this strategy. In Ferryman et al. , vehicle detection is based on template matching  while tracking uses dynamic filtering. In that work, high order statistics were used for detection and a Euclidean- distance-based correlation was employed for tracking. In , vehicles were tracked using multiple cues such as intensity and edge data. To increase sensor range for vehicle tracking, Clady et al. employed an additional P/T/Z camera . In , close to real time performance was reported (i.e., 14 frames per second) by integrating detection with tracking based on deformable models. This approach has several drawbacks. First, false detections will be passed to the tracker without a chance of rectification. Second, tracking templates from imperfect detections will jeopardize the reliability of trackers. Most importantly, this type of approaches do not exploit temporal information in detection.
There exist several exceptions, where temporal informa- tion has been incorporated into detection. Betke et al. ,  have realized that reliable detection from one or two images is very difficult and it only works robustly under cooperative conditions. Therefore, they used a refined search within the tracking window to re-enforce the detections (i.e., a car template was created online every 10th frame and was correlated with the object in the tracking windows). Similar to , temporal tracking was used to suppress false detections in , where only two successive frames were employed. Similar observations were made by Hoffman et al.  (i.e., detection quality was improved by accumulating feature information over time).
Temporal information has not been fully exploited yet in the literature. Several efforts have been reported in  and more recently in . We envision a different strategy (i.e., detect-and-track), where detection and tracking are addressed simultaneously in a unified framework (i.e., detection results trigger tracking, and tracking reenforces detection by accumulating temporal information through some probabil- istic models). Approaches following this framework would have better chances to filter out false detections in subsequent frames. In addition, tracking template updates would be achieved through repeated detection verifications.
On-road vehicle detection is so challenging, that none of the methods reviewed can solve it alone completely. Different methods need to be undertaken and selected based on the prevailed conditions faced by the system , . Complementary sensors and algorithms should be used to improve overall robustness and reliability. In general,
Fig. 10. Detecting vehicles in different regions requires different methods. A1: Close by regions. A2: Overtaking regions. A3: Midrange/ distant regions.
surrounding vehicles can be classified into three categories according to their relative position to the host vehicle:
overtaking vehicles, 2) midrange/distant vehicles, and
close-by vehicles (see Fig. 10). In close-by regions (A1),
we may only see part of the vehicle. In this case, there is no free space in the captured images, which makes the shadow/symmetry/edge-based methods inappropriate. In the overtaking regions (A2), only the side view of the vehicle is visible while appearance changes fast. Methods detecting vehicles in these regions might be better to employ motion information or dramatic intensity changes , . Detecting vehicles in the midrange/distant region (A3) is relatively easier since the full view of a vehicle is available and appearance is more stable.
Next, we provide a critique of the HG and HV methods reviewed in the previous sections. Our purpose is to emphasize their main strengths and weaknesses as well as to present potential solutions reported in the literature for enhancing their performance for deployment in real settings. The emphasis is on making these methods more reliable and robust to deal with the challenging conditions encountered in traffic scenes. Additional issues are discussed in Section 9.
Critique of Knowledge-Based HG Methods
Systems employing local symmetry, corners, or texture information for HG are most effective in relatively simple environments with no or little clutter. Employing these cues in complex environments (e.g., when driving downtown where the background contains many buildings and different textures), would introduce many false positives. In the case of symmetry, it is also imperative to have a rough estimate of the vehicle’s location in the image for fast and accurate symmetry computations. Even when utilizing both intensity and edge information, symmetry is quite prone to false detections, such as symmetrical background objects, or partly occluded vehicles.
Color information has not been deployed extensively for HG due to the inherent difficulties of color-based object detection in outdoor settings. In general, the color of an object depends on illumination, reflectance properties of the object, viewing geometry, and sensor parameters. Conse- quently, the apparent color of an object can be quite different during different times of the day, under different weather conditions, and under different poses.
Employing shadow information and vehicle lights for HG have been exploited in a limited number of studies. Under perfect weather conditions, HG using shadow information can be very successful. However, bad weather conditions (i.e., rain, snow, etc.) or bad illumination conditions make road pixels quite dark, causing this method to fail. Vehicle