Modeling service time Service time is defined as the sum of seeks, latency, and transfer time.

•

Seek time – Seek time is the time it takes the disk arm to move and position the disk head on the correct track (that is, move from track X (serving the previous I/O) to track Y (serving the next I/O)). Optimizer uses a gig-to-gig database to model seek time for different addresses and different disk drives.

•

Latency time – Latency time is the delay for disk rotation. The latency time is a function of the disk rotational speed. Optimizer assumes one-third of a spin for each I/O assuming the internal disk optimization is on.

•

Transfer time – Transfer time is the time that it takes the disk to transfer the data from/to the disk. Transfer time is a function of the data transfer rate, disk bandwidth, and data layout. Optimizer uses the Zone Bit Recording (ZBR) database to model the transfer time of data. The ZBR database includes information about the bandwidth of each zone of the disk.

Accurate seek and latency times are impractical to get because they require a complete trace of the I/O sequence, so the Optimizer uses a mathematical model to calculate these metrics. The Optimizer model makes the following assumptions in order to bridge the gap of missing data:

•

Random Distribution – Optimizer assumes that the arrival rate and sequence of I/Os at the back end

are randomly distributed.

•

Independent I/Os – Optimizer does not assume any relationships between two consecutive I/Os.

These two assumptions lead to the following result: Let n be the number of hypers on a disk, let activity done by hyper i at time T, and let SA be the total activity done by the disk at that time.

be the

# The probability of serving the next I/O from hyper i is then:

regardless of what

the previous and next I/Os are. Based on vendor specification, let’s assume that

is the time it takes to

seek from hyper i to hyper j, then we can model the seek time (ST) based on Wong’s formula:

This result is, in a sense, a worst-case scenario. Optimizer calculated seek time is usually higher than the actual disk seek time because real-life workloads usually include sequences of I/Os and high percentage of locality of references.

Finding the best swap The Optimizer analysis consists of three high-level phases:

•

Calculate service time: Model and sum the total service time of each disk and for every timestamp that was marked to be included by the analysis.

•

Sort disks by activity: Sort all disks by their modeled total service time.

•

Find best swap: Starting at the busiest disk, check all potential swaps. The analysis process models what-if scenarios using virtual swaps to estimate the impact of a swap on the service time of the affected disks. The philosophy of Optimizer is to check as many swaps as possible in order to guarantee that one of the best swaps is indeed selected.

The biggest challenge is to define how good a swap is and if it is good enough to be a candidate for execution. A few years ago, when disks were divided into two or four hypers, each swap had the potential to improve performance by 10, 20, or even 50 percent. This, however, is not the case anymore. Currently,

EMC Symmetrix Optimizer A Detailed Review

9