SUN ET AL.: ON-ROAD VEHICLE DETECTION: A REVIEW
X n i ¼ n X n j ¼ n h Eði þ x0; j þ y0; t0Þ Eði þ x; j þ y; tÞ
w h e r e ð x 0 ; y 0 Þ a n d ð x ; y Þ a r e t w o c o r r e s p o n d i n g p o i n t s a t time t0 and t. The size of the search window was n n. Since adjusting the corresponding pairs for each of the points was quite expensive, they employed a less dense grid to reduce computational cost.
Kruger et al.  estimated optical flow from spatio- temporal derivatives of the gray value images using a local approach. They further clustered the estimated optical flow to eliminate outliers. Assuming a calibrated camera and known ego-motion, they detected both moving and stationary objects. Generating a displacement vector for each pixel (i.e., dense optical flow) is time consuming and also impractical for a real-time system. In contrast to dense optical flow, “sparse optical flow” is less time consuming by utilizing image features, such as corners , , local minima and maxima , or “Color Blobs” . Although it can only produce a sparse flow, feature based methods can provide sufficient information for HG. Moreover, in contrast to pixel-based optical flow estimation methods where pixels are processed independently, feature-based methods utilize high-level information. Consequently, they are less sensitive to noise.
The input to the HV step is the set of hypothesized locations from the HG step. During HV, tests are performed to verify the correctness of a hypothesis. Approaches to HV can be classified mainly into two categories: 1) template-based and 2) appearance-based. Template-based methods use predefined patterns from the vehicle class and perform correlation. Appearance-based methods, on the other hand, learn the characteristics of the vehicle class from a set of training images which should capture the variability in vehicle appearance. Usually, the variability of the nonvehicle class is also modeled to improve the performance. Each training image is represented by a set of local or global features. Then, the decision boundary between the vehicle and nonvehicle classes is learned either by training a classifier (e.g., NNs, Support Vector Machines (SVMs)) or by modeling the probability distribution of the features in each class (e.g., using the Bayes rule assuming a Gaussian distribution).
Template-based methods use predefined patterns of the vehicle class and perform correlation between the image and the template. Some of the templates reported in the literature represent the vehicle class “loosely,” while others are more detailed. It should be mentioned that, due to the nature of the template matching methods, most papers in the literature do not report quantitative results and demonstrate performance through examples.
Parodi and Piccioli  proposed a hypothesis verification scheme based on the presence of license plates and rear windows. This can be considered as a loose template of the vehicle class. No quantitative performance was include in the paper. Handmann et al.  proposed a template based on the observation that the rear/frontal view of a vehicle has a “U” shape (i.e., one horizontal edge, two vertical edges, and two corners connecting the horizontal and vertical edges).
During verification, they considered a vehicle to be present in the image if they could find the “U” shape.
Ito et al.  used a very loose template to recognize vehicles. Using active sensors for HG, they checked whether or not pronounced vertical/horizontal edges and symmetry existed. Due to the simplicity of the template, they did not expect very accurate results, which was the main reason for employing active sensors for HG. Regensburger et al.  utilized a template similar to . They argued that the visual appearance of an object depends on its distance from the camera. Consequently, they used two slightly different generic object (vehicle) models, one for nearby objects and another for distant objects. This method, however, raises the question of what model to use in a specific location. Instead of working with different generic models, distance-depen- dent subsampling was performed before the verification step in .
A template, called “moving edge closure,” was used in  which was fit to groups of moving points. To get the moving edge closure, they performed edge detection on the area covered by the detected moving points, followed by the external edge connection. If the size of the moving edge closure was within a predefined range, they claimed vehicle detected. Nighttime vehicle detection was also addressed in this work . Basically, pairs of headlights were consid- ered as templates for vehicle detection.
A rather loose template was also used in , where hypotheses were generated on the basis of road position and perspective constraints. The template contained a priori knowledge about vehicles: “A vehicle is generally symmetric, characterized by a rectangular bounding box which satisfies specific aspect ratio constraints.” The model matching worked as follows: Initially, the hypothesized region was checked for the presence of two corners representing the bottom of the bounding box, similar to the “U” shape idea in . The presence of corners was validated using perspective and size constraints. Then they detected the top part of the bounding box in a specific region determined, once again, by perspective and size constraints. Once the bounding box was detected successfully, they claimed vehicle presence in that region. This template could be very fast, however, it introduces some uncertainties, given that there might be other objects on the road satisfying those constraints (e.g., distant buildings).
HV using appearance models is treated as a two-class pattern classification problem: vehicle versus nonvehicle. Building a robust pattern classification system involves searching for an optimum decision boundary between the classes to be categorized. Given the huge within-class variabilities of the vehicle class, we can imagine that this is not an easy task. One feasible approach is to learn the decision boundary based on training a classifier using the feature sets extracted from a training set.
Appearance-based methods learn the characteristics of vehicle appearance from a set of training images which capture the variability in the vehicle class. Usually, the variability of the nonvehicle class is also modeled to improve performance. First, a large number of training images is collected and each training image is represented by a set of local or global features. Then, the decision boundary between the vehicle and nonvehicle classes is