included for lack of space), and in every case the hierarchical partitioning algorithm found balanced partitions, with may areas in dense regions of the graph, and less but larger areas at more sparse regions of the graph.

4.2

# Area partitioning algorithm

In this section we develop an efficient algorithm that can automatically generate a semantically meaningful partition of the road network by using road hierarchy information. The algorithm uses a flood filling technique to identify strongly connected components delimited by high level edges. We take as input the road network G and the edge class k used for partitioning. Fist we assign to each node an empty set of areas. We then choose a node n with class(n) > k (i.e., connected to edges less important than the ones used for partitioning), and add area a to this node’s area set; at this point we move to all neighbors of the node that are reach- able through the edges of the classes greater than k, and add a to their area sets. This process repeats until no further nodes can be reached. We are basically walking in every possible direction from the node until we reach large roads used for partitioning and we stop. At this point we increase our area number, move to the next node with an empty area set, and repeat the process until all nodes are assigned to at least one area. One nice feature of the algorithm is that it automatically identifies interior nodes (those with a single area in their area set), and border nodes (those with multiple areas in their area set). Analysis. Algorithm 1 examines each interior node O(1) times, and border nodes O(|a|), where |a| is the number of areas. So the order of the overall algorithm is O(n × |a|). In general, |a| n. So the algorithm can be considered linear in the number of nodes. In our experiments we partitioned real road network graphs with more than a million nodes in just a few seconds.

## Algorithm 1 Area partitioning

Input: G(V, E), k edge class used to partition G. O u t p u t : P a r t i t i o n P ( k ) = V k 1 , V k 2 , . Method: . . , V k n

1: area = 0; 2: Areas[n_{i}] = φ for all node n_{i}; 3: for Each node n_{i }do

4: 5: 6: 7: 8: 9:

10: 11: 12: 13:

if Area[n_{i}] = φ and class(n_{i}) > k then q.push(n_{i}); while not q.empty() do n ← q.pop(); n.mark = true; Area[n] = Area[n] ^{ }{area}; push into q all neighbors of n reachable through edges of class greater than k, s.t., n.mark = false; end while area = area + 1; q.clear(); end if

14: end for 1 5 : f o r e a c h a r e a i c o n s t r u c t V k i a s t h e s e t o f n o d e s n s u c that i ∈ Area[n] h

5.

# TRAFFIC MINING

An important contribution of our work is to take into consideration the factors that affect driving speed as well as driving behavior in the adaptive fastest path computation

798

algorithm. We think that route planning software has to account for all such factors in order to provide routes that are not only fast and relevant given the conditions encoun- tered by the driver, but also well supported, in the sense that many drivers in the past have opted for such a route under similar circumstances.

5.1

# Speed pattern mining

In any road network multiple factors influence the speed at which we can travel through different roads. Weather, time of day, vehicle class, and road construction are just a few among the many dimensions that can influence road speed. In this section we will develop a method that mines, from a large database, a set of concise rules that identify the most relevant factors influencing road speed.

As we mentioned before, the traffic database augmented with extra factors, contains a collection of traffic tuples of the form edge id, time, (d_{1}, . . . , d_{k}) : speed (if we have ve- hicle level traffic data, we compute the average speed for the edge under every condition for all cars). We can see the problem of speed pattern mining, as a classification problem where we would like to predict edge speed based on time and feature values d_{1}, . . . , d_{k}. We can derive rules such as “if area = a_{1 }and weather = icy and time = rush hour then speed = 1/4 × base speed”. Looking at this rule we notice a few things. First, feature values reside at different levels of abstraction, i.e., each factor has an associated concept hierarchy and the rules can use values at any level. Second, speed has been expressed as a relative value with respect to a base speed, such transformation allows us to build more general rules; in the above example “1/4 × base speed” in- dicates that regardless of the initial speed of the road, it slows down to a quarter of its initial value. Third, in this case the class label which is speed, or more concretely, speed factor, is a continuous attribute that requires some form of discretization in order to perform rule induction.

There are several methods to perform rule induction, in this paper we chose decision tree induction [25] as it pro- vides rule predicates of the desired form, which in general have good accuracy and generality, and the method can be applied to very large datasets efficiently [12]. Before run- ning a decision tree algorithm we run a preprocessing step to discretize speed factors, which will be treated as our class label. Speed factors can be discretized through clustering [17], with each traffic tuple assigned to a cluster centroid. This is beneficial for route planning applications, as it allows us to derive speed rules that are supported by a large num- ber of observations and are thus statistically significant. For the time dimension, we can use its concept hierarchy to reg- ister values at a higher level of abstraction than that of the raw data, e.g., from seconds to minutes. We can then treat time at the minute level as a continuous attribute which can be handled by decision tree algorithms through binary splits [24], or multi-interval discretization methods [8].

5.2

# Driving pattern mining

One of the most common ways that drivers use to de- termine good driving routes in an unfamiliar area is to ask local people for tips, we may find for example that route R_{1 }is very good in the summer but that in winter the road become unsafe, or that route R_{2 }although fast, goes through a high crime area and should be avoided at night. This is valuable information that has been largely ignored by route