X hits on this document

PDF document

IEEE INTERNET COMPUTING - page 4 / 9

24 views

0 shares

0 downloads

0 comments

4 / 9

Global Deployment of Data Centers

Characteristic Hits per day Number of machines

Online ~100 million ~500, at two data centers

Content ~7 million ~500, at four data centers plus client sites x86 Open-source x86 Medium

ReadMostly ~100 million >2,000, at four data centers

Hardware Operating system Relative read/write ratio

Sparc and x86 Solaris High

x86 Open-source x86 Very high (and users update very little data)

Table 2. Characteristics of the large-scale Internet services examined.

Paired client service proxies

the company’s headquarters provides the client with an up-to-date list of the best servers to con- tact. The servers might be at the company’s head- quarters or at its collocation facility.

Internet

Load-balancing switch

Paired backup site

ReadMostly uses its switch vendor’s proprietary global load-balancing mechanism to direct users. This mechanism rewrites DNS responses based on sites’ load and health information, which it collects from cooperating switches at those sites.

Metadata servers (14 total)

Single-Site Architecture A single site’s architecture consists of three tiers: load balancing, front-end servers, and back-end servers. Figures 1, 2, and 3 depict the single-site architectures of Online, Content, and ReadMostly, respectively.

Data storage servers (100 total)

Figure 2. Architecture of a Content site. Stateless metadata servers provide file metadata and route requests to the appropriate data storage server. These servers, which use commodity hardware and run custom software, are accessed via a custom protocol over UDP. The Internet connects each cluster to its twin backup site.

tive complexity, the two sites work in redundant pairs — the service points some clients to one pair of sites, and other clients to another pair. A client’s primary server site propagates updates to its sec- ondary server site nightly.

Whenever an Online client connects, a server at

Load balancing. To balance load, one or more net- work switches distributes incoming requests to front-end servers based on the servers’ loads. Although many modern switches offer Layer-7 switching functionality — meaning they can route requests based on the contents of a user’s request

  • none of the services we surveyed use this fea-

ture. Instead, they generally use simple round- robin DNS or Layer-4 load distribution to direct clients to the least loaded front-end server. In round-robin DNS, a service advertises multiple IP addresses and continuously reprioritizes the addresses to spread load among the corresponding machines. In Level-4 load distribution, clients con- nect to a single IP address and the cluster’s switch routes the connection to a front-end server.

Content and ReadMostly use Layer-4 request distribution at each site, as does Online for state- less parts of its service (the Web proxy cache and content portals, for example). For the stateful parts of its service (such as e-mail), Online also uses Layer-4 request distribution, but adds a level of stateful front-end load balancing on top. In par- ticular, Online maps a user to one of several clus- ters (called service groups) based on the user’s identity (determined when the user logs in to the

44

SEPTEMBER • OCTOBER 2002

http://computer.org/internet/

IEEE INTERNET COMPUTING

Document info
Document views24
Page views24
Page last viewedTue Jan 17 15:04:15 UTC 2017
Pages9
Paragraphs298
Words5204

Comments