IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,
learned either by training a classifier or by modeling the probability distribution of the features in each class.
Various feature extraction methods have been investi- gated in the context of vehicle detection. Based on the method used, the features extracted can be classified as either local or global. Global features are obtained by considering all the pixels in an image. Wu and Zhang  used standard Principal Components Analysis (PCA) for feature extraction, together with a nearest-neighbor classifier, reporting an 89 percent accuracy. However, their training database was quite small (93 vehicle images and 134 nonvehicle images), which makes it difficult to draw any useful conclusions.
An inherent problem with global feature extraction approaches is that they are sensitive to local or global image variations (e.g., pose changes, illumination changes, and partial occlusion). Local feature extraction methods on the other hand are less sensitive to these effects. Moreover, geometric information and constraints in the configuration of different local features can be utilized either explicitly or implicitly.
Different from , in , PCA was used for feature extraction and Neural Networks (NNs) for classification. First, eachsubimagecontaining vehiclecandidates wasscaled to 20 20, then it was subdivided into 25 4 4 subwindows. PCA was applied on every subwindow (i.e., “local PCA”) and the output was provided to a NN to verify the hypothesis.
Goerick et al.  andNoli et al.  usedthe (LOC) method (see Section 5.1.5) to extract edge information. The histogram of LOC within the area of interest was then provided to a NN classifier, a Bayes classifier and combination of both for classification. For NN, the number of nodes in the first layer was between 350-450 while the number of hidden nodes was 10-40. They used 2,000 examples for training and the whole system ran in real-time. The performance of the neural net classifier was 94.7 percent, which is slightly better than their Bayes classifier (94.4 percent), also very close to the combined classifier (95.7 percent).
Kalinke et al.  designed two models for vehicle detection: one for sedans and the other for trucks. Two different model generation methods were used. The first one was designed manually, while the second one was based on a statistical algorithm using about 50 typical trucks and sedans. Classification was performed using NNs. The input to the NNs was the Hausdorrf distances between the hypothesized vehicles and the models, both represented in terms of the LOC. The NN classified every input into three classes: sedans, trucks, or background. Similar to , Handmann et al.  utilized the histogram of LOC, together with a NN, for vehicle detection. The Hausdorrf distance was used for the classifica- tion of trucks and cars such as in . No quantitative performance was reported in  or .
A statistical model of vehicle appearance was investigated by Schneiderman and Kanade . A view-based approach employing multiple detectors was used to cope with view- point variations. The statistics of both object and “nonobject” appearance were represented using the product of two histograms with each histogram representing the joint statistics of a subset of Haar wavelet features and their position on the object. A three-level wavelet transform was used to capture the space, frequency, and orientation information. This three-level decomposition produced 10 subbands and 17 subsets of quantized wavelet coefficients were used. Bootstrapping was used to gather the statistics of the nonvehicle class. The best performance reported in 
was 92 percent. A different statistical model was investigated by Weber et al. . They represented each vehicle image as a constellation of local features and used the Expectation- Maximization (EM) algorithm to learn the parameters of the probability distribution of the constellations. They used 200 images for training and reported an 87 percent accuracy.
An overcomplete dictionary of Haar wavelet features was utilized in  for vehicle detection. They argued that this representation provided a richer model and spatial resolution and that it was more suitable for capturing complex patterns. The overcomplete Haar wavelet features were derived from a set of redundant functions, where the wavelets at level n was 1=4 2n instead of 2n. They referred it to as quadruple density dictionary. A total of 1,032 positive training patterns and 5,166 negative training patterns were used for training and the ROC showed that the false positive rate was close to 1 percent when the detection rate approached to 100 percent.
Sun et al. ,  went one step further by arguing that the actual values of the wavelet coefficients are not very important for vehicle detection. In fact, coefficient magni- tudes indicate local oriented intensity differences, informa- tion that could be very different even for the same vehicle under different lighting conditions. Following this observa- tion, they proposed using quantized coefficients to improve detection performance. The quantized wavelet features yielded a detection rate of 93.94 percent compared to 91.49 percent using the original wavelet features.
Using Gabor filters for vehicle feature extraction was investigated in . Gabor filters provide a mechanism for obtaining orientation and scale tunable edge and line detectors. Vehicles contain strong edges and lines at different orientation and scales; thus, this type of features are very effective for vehicle detection. The hypothesized vehicle subimages were subdivided into nine overlapping subwindows. Gabor filters were then applied on each subwindow separately. The magnitudes of the responses of the Gabor filters were collected from each subwindow and represented by three moments: the mean , the standard deviation , and the skewness . Classification was performed using Support Vector Machines (SVMs) yield- ing an accuracy of 94.81 percent.
A “vocabulary” of information-rich vehicle parts was constructed automatically by applying the Forstner interest operator onto a set of representative images, together with a clustering method in . Each image was represented in terms of parts from this vocabulary to form a feature vector, which was used to train a classifier to verify hypotheses. Some successful detections were reported under high degree of clutter and occlusion, and an overall 90.5 percent accuracy was achieved. Following the same idea (i.e., detection using components), Leung  investigated a different vehicle detection method. Instead of using the Forstner interest operator, differences of Gaussians were applied onto images in scale space, and maxima and minima were selected as the key-points. At each of the key- points, the Scale Invariant Feature Transform (SIFT)  was utilized to form a feature vector, which was used to train a SVM Classifier. Leung tested his algorithm on the UIUC data , showing slightly better performance.
INTEGRATING DETECTION WITH TRACKING
Vehicle detection can be improved considerably, both in terms of accuracy and time, by taking advantage of the temporal continuity present in the data. This can be achieved by