In this work, we develop a traffic-mining-based path-finding method that first mines speed and driving models from his- toric traffic data, and when a query is posed to the system (which contains the start point, the end point, and departure time, or some other information, e.g., car pool), it computes the fastest route, based on additional conditions, such as weather forecast, or road construction/closure information. In contrast with the current route planning systems, our method is based on traffic data mining and has the follow- ing unique features:

•

The suggested route is composed of as many frequent path segments as possible, i.e., it matches historic driving behavior, and thus should account for subtle and difficult to model conditions.

•

The suggested route takes into consideration the speed conditions expected to be encountered during the trip, given multiple factors that may influence the speed, such as relevant locations, time, type of vehicles, etc.. The key technical contributions of the paper can be sum-

marized as follows:

•

Road hierarchy-based partitioning. We use the nat- ural hierarchy present in road networks to partition the network into semantically meaningful areas. We con- struct high level areas by dividing the graph using the largest possible roads: Each area at this level is enclosed by large highways and will probably contain a large num- ber of nodes and edges. We recursively subdivide areas by progressively decreasing the road-scale. An efficient algorithm is developed that automatically partitions an arbitrary road network and constructs a natural hierar- chy of areas. These areas are essential to the algorithm as they will be used to guide the driving pattern mining and adaptive fastest path pre-computation.

•

Speed rule mining. We take a traffic database that records observed road speeds under a variety of condi- tions (e.g., weather, time of day, and accidents) and in- duce a set of concise rules of the form “if conditions c for edge e then speed factor = f”, where speed factor is the speedup or slowdown for an edge with respect to the edge’s base-speed. Speed factors are clustered to increase rule support. The concise set of rules is mined through a decision tree induction algorithm.

•

Driving pattern mining. We mine frequently trav- eled edges or edge-sequences in order to obtain important driving hints that are hidden in the data and otherwise difficult to model. We propose a novel area-level mining that computes frequent path-segments at the area level with a support relative to the traffic in the area, e.g., lower support thresholds for edges in rural areas than those in big cities. Such adaptive support will avoid gen- erating too many or too few driving patterns.

•

Adaptive pre-computation. Fastest path algorithms usually pre-compute a subset of fastest paths in order to speedup path computation at runtime. The problem is that pre-computation schemes assume unique fastest paths, and when we have variable edge speeds, fastest paths can be valid only for a given set of conditions. We develop an area-level pre-computation strategy that pre- computes high benefit paths at the area level, i.e., the fastest paths do not change too much. This strategy al- lows us to speed up query processing without exploding the space requirements.

795

•

Road upgrading. We observe that people usually drive between end points by going through the largest possible roads available unless there are smaller roads that are faster than the large ones. Based on this idea, we develop an algorithm that for each area computes the set of small roads that should be upgraded, i.e., considered in routing because they have a significant advantage over large roads enclosing the area.

•

Adaptive fastest path algorithm. Finally, we pro- pose an efficient routing algorithm that uses the road hierarchy and pre-computed areas to limit the search space. This improves trip duration by using upgraded roads whenever beneficial, and finds routes that take into consideration both speed and driving patterns. The rest of the paper is organized as follows. Section 2

presents a formal definition of the problem. Section 3 de- scribes the structure of the database of traffic observations. Section 4 introduces the idea of road network partitioning and gives an efficient algorithm for automated road network segmentation. Section 5 proposes a method to mine speed and driving patterns. Section 6 presents the ideas of path pre-computation and road upgrades. Section 7 develops the fastest path algorithm. Section 8 reports on experimental results. We discuss related work in Section 9 and conclude our study in Section 10.

2.

# PROBLEM DEFINITION

In this section we first define a set of key concepts: road network, driving patterns, speed patterns, and forecast func- tions, and then present the problem statement.

Definition 2.1. A road network is a directed graph G(V, E), where V is a set of vertices representing road intersections and terminal points, and E is a set of edges representing road segments each connecting two vertices.

Figure 1 presents the road network for the town of San Joaquin in California. Larger roads are shown in bold. The graph contains 24,123 edges and 18,496 nodes, and was ob- tained from TIGER line files provided by the US census ^{1}.

# Definition 2.2. A speed pattern is a tuple of the form

edege id, t start, t end, (d

_{1}, d_{2}, . . . , d_{k}) : m, where edge id

is an edge, (t start, t end) is a time interval, each d_{i }is a value for speed factor D_{i}, and m is an aggregate function computed on edge speed.

Speed patterns describe road speeds under a variety of conditions. Table 2 presents an example of speed patterns for a particular edge (road segment) in a road network. In the example we list edge speeds for three conditions: time- of-day, D_{1 }= weather, and D_{2 }= vehicle-type. In reality, it is possible to consider many more factors, such as road con- struction and accidents at nearby edges. The speed table can be constructed by integrating information from multi- ple sources, e.g., we can obtain information on road speed, based on time, location, weather data, and road construc- tion information. Currently, it is possible to obtain speed conditions for roads at certain important metropolitan areas like San Francisco and Chicago, and there is a trend for an increased availability of such information, evidenced by the

^{1}http://www.census.gov/geo/www/tiger/tgrcd108/ tgr108cd.html