X hits on this document

PDF document

google-cluster-auctions.pdf - page 1 / 8





1 / 8

Using a Market Economy to Provision Compute Resources Across Planet-wide Clusters

Murray Stokely, Jim Winget, Ed Keyes, Carrie Grimes Google, Inc. Mountain View, CA {mstokely,winget,edkeyes,cgrimes}@google.com

Benjamin Yolken Department of Management Science and Engineering Stanford University Stanford, CA yolken@stanford.edu

Abstract—We present a practical, market-based solution to the resource provisioning problem in a set of heterogeneous resource clusters. We focus on provisioning rather than imme- diate scheduling decisions to allow users to change long-term job specifications based on market feedback. Users enter bids to purchase quotas, or bundles of resources for long-term use. These requests are mapped into a simulated clock auction which determines uniform, fair resource prices that balance supply and demand. The reserve prices for resources sold by the operator in this auction are set based on current utilization, thus guiding the users as they set their bids towards under-utilized resources. By running these auctions at regular time intervals, prices fluctuate like those in a real-world economy and provide motivation for users to engineer systems that can best take advantage of available resources.

These ideas were implemented in an experimental resource market at Google. Our preliminary results demonstrate an efficient transition of users from more congested resource pools to less congested resources. The disparate engineering costs for users to reconfigure their jobs to run on less expensive resource pools was evidenced by the large price premiums some users were willing to pay for more expensive resources. The final resource allocations illustrated how this framework can lead to significant, beneficial changes in user behavior, reducing the excessive shortages and surpluses of more traditional allocation methods.


The last decade has seen the rise of IT services supported by large, heterogeneous, distributed computing systems. These systems, which serve as the backbone for “grid,” “cloud,” and “utility” computing environments, provide the necessary speed, redundancy, and cost-effectiveness to enable the pow- erful Internet applications offered by Google and others. In addition, such systems can be used internally, within the enterprise, to support essential business functions. In the future, as costs per unit continue to go down, we will see exciting new functionality and applications supported by these environments.

Despite their size, however, the resources in these large- scale, distributed systems are necessarily finite. Not every external client or internal engineering team can get everything that it demands. The system operator must place hard limits on the CPU, disk, memory, etc. that each job or job class can use. Otherwise, some tasks will be starved and user experience will suffer. These allocation limits are then mapped into the

low-level scheduling algorithms (e.g., time sharing) used to actually assign jobs to units of physical hardware.

Traditionally, such limits / quotas have been set manually according to pre-defined policies. The operator either grants each user an equal share of the system or, more likely, decides that certain jobs / users are “more important” than others, giving the former higher quotas or the ability to preempt lower-ranked tasks. This approach works well for small, ho- mogeneous systems, but scales poorly as the systems and their supported user bases get larger and more heterogeneous. In the latter case, the information requirements on the operator can become extremely onerous; even with complex optimization procedures, the final allocation is often inefficient from a system-wide perspective. These inefficiencies are manifested though uneven utilization, significant shortages and surpluses in certain resource pools, and general user unhappiness.

One solution is to instead allocate resources through a market-based system. Indeed, this is the manner in which nearly all commodities (e.g., oil, food, real estate, etc.) are priced and allocated in the “real-world” economy. People use money to purchase goods and services in such a way as to maximize their individual happiness. Participants with excess resources, on the other hand, sell them to receive money. The prices for these goods and services, in turn, dynamically fluctuate so as to match supply and demand. If the latter forces are not well-understood, then one can use auctions to gauge these and set prices appropriately. In this way, goods are allocated efficiently without quotas or extensive centralized control.

Much literature has proposed applying these market princi- ples to the large-scale computing systems under study. Many of these models use auction-based pricing for the scheduling of individual jobs across multiple administrative domains [1], [2], [3]. Within this environment, these authors propose replacing standard scheduling algorithms with “economically inspired” ones. Round robin, for instance, might be replaced by a simulated auction in which the priority of each job is mapped into a “bid” [4], [5].

While the previous models are potentially useful for low- level scheduling, our focus here is on the higher-level pro- visioning of aggregate resources (e.g. quotas) across users within a single administrative domain that spans sites around

Document info
Document views29
Page views29
Page last viewedFri Jan 20 07:59:09 UTC 2017