X hits on this document

PDF document

IEEE INTERNET COMPUTING - page 3 / 9

16 views

0 shares

0 downloads

0 comments

3 / 9

Large-Scale Internet Services

Paired site

Clients

Internet

Web proxy caches servers (400 total)

Load-balancing switch

Stateless servers for stateful services (e-mail, news, favorites, and so on) (Six service groups of eight servers each)

Stateless servers for stateless services (for example, content portals) (50 total)

News article storage

Filesystem-based storage (Net App filers) (Six total; one per service group)

Database (storage of customer records, crypto keys, billing information, and so on)

Figure 1. Architecture of an Online site. Depending on the feature selected, the client software chooses to route the user request to a Web proxy cache server, one of 50 stateless servers, or one of the eight servers from the user’s service group. Network appliance servers store persistent state, which cluster nodes access via Network File System (NFS) over the user datagram protocol (UDP). A leased network connection links the cluster to a second site at a collocation facility.

services, which we will call:

  • Online—an online service/Internet portal

  • Content—a global content-hosting service

  • ReadMostly—a high-traffic Internet service

with a very high read-to-write ratio

Table 2 highlights some of the services’ key char- acteristics. To keep the services’ identities confi- dential, we have abstracted some of the informa- tion to make it more difficult to identify them.

Architecturally, these services

  • reside in geographically distributed collocation facilities,

  • consist largely of commodity hardware but custom software,

  • achieve improved performance and availabili- ty through multiple levels of redundancy and load balancing, and

  • contain a load-balancing tier, a stateless (stores no persistent state except operating system code) front-end tier, and a stateful (stores per- sistent data) back-end tier.

As Table 2 (next page) shows, the primary differ- ences between these services are load and read/write ratio.

Geographic Server Distribution At the highest level, many services distribute their servers geographically. Online distributes its servers between its headquarters and a nearby col- location facility; ReadMostly uses a pair of facili- ties on the United States East Coast and another pair on the West Coast; and Content uses four facilities: one each in Asia, Europe, and the East and West Coasts of the U.S. All three services use this geographic distribution for availability, and in all but Content the redundant data centers share in handling user requests to improve performance.

When using distributed data centers to share load, services can employ several mechanisms to direct user queries to the most appropriate site. The choice of site generally takes into account each site’s load and availability.

Content pushes this functionality to the client, which runs custom software pointed to one pri- mary and one backup site. To reduce administra-

IEEE INTERNET COMPUTING

http://computer.org/internet/

SEPTEMBER • OCTOBER 2002

43

Document info
Document views16
Page views16
Page last viewedSun Dec 04 09:10:43 UTC 2016
Pages9
Paragraphs298
Words5204

Comments