Large-Scale Internet Services
Web proxy caches servers (400 total)
Stateless servers for stateful services (e-mail, news, favorites, and so on) (Six service groups of eight servers each)
Stateless servers for stateless services (for example, content portals) (50 total)
News article storage
Filesystem-based storage (Net App filers) (Six total; one per service group)
Database (storage of customer records, crypto keys, billing information, and so on)
Figure 1. Architecture of an Online site. Depending on the feature selected, the client software chooses to route the user request to a Web proxy cache server, one of 50 stateless servers, or one of the eight servers from the user’s service group. Network appliance servers store persistent state, which cluster nodes access via Network File System (NFS) over the user datagram protocol (UDP). A leased network connection links the cluster to a second site at a collocation facility.
services, which we will call:
Online—an online service/Internet portal
Content—a global content-hosting service
ReadMostly—a high-traffic Internet service
with a very high read-to-write ratio
Table 2 highlights some of the services’ key char- acteristics. To keep the services’ identities confi- dential, we have abstracted some of the informa- tion to make it more difficult to identify them.
Architecturally, these services
reside in geographically distributed collocation facilities,
consist largely of commodity hardware but custom software,
achieve improved performance and availabili- ty through multiple levels of redundancy and load balancing, and
contain a load-balancing tier, a stateless (stores no persistent state except operating system code) front-end tier, and a stateful (stores per- sistent data) back-end tier.
As Table 2 (next page) shows, the primary differ- ences between these services are load and read/write ratio.
Geographic Server Distribution At the highest level, many services distribute their servers geographically. Online distributes its servers between its headquarters and a nearby col- location facility; ReadMostly uses a pair of facili- ties on the United States East Coast and another pair on the West Coast; and Content uses four facilities: one each in Asia, Europe, and the East and West Coasts of the U.S. All three services use this geographic distribution for availability, and in all but Content the redundant data centers share in handling user requests to improve performance.
When using distributed data centers to share load, services can employ several mechanisms to direct user queries to the most appropriate site. The choice of site generally takes into account each site’s load and availability.
Content pushes this functionality to the client, which runs custom software pointed to one pri- mary and one backup site. To reduce administra-
IEEE INTERNET COMPUTING
SEPTEMBER • OCTOBER 2002