Handling Unpredictable High Server-Load
World Wide Web, with its
acronym WWW, is also known as the World Wide Wait.
Some servers receive many requests at certain times of day,
and as a result serving each of these requests takes longer.
There is no universal solution which is guaranteed to work under
all circumstances: with many millions of users on Internet,
it is theoretically possible that all of them send a request to one and
the same server at the exact same moment.
There exists no server architecture capable of handling millions of
simultaneous requests.
How many requests a server can handle depends on a number of factors:
- Network capacity: Usually the speed of the network
interface of the server exceeds that of the external Internet connection.
If a hypertext node (Web page) is on average 10 Kbytes large (including
images), then a 128 Kbps ISDN line can handle (a little more than) one
request per second. A 1 Mbps leased line will serve 10 requests per second.
This indicates that the most popular Web-sites, serving over 1000 requests
per second, require Internet connections of 100 Mbps or more.
- TCP/IP implementation: When a user attempts to
contact an Internet host a request is put in a queue of requests that
need to be acknowledged. If requests arrive too fast the queue may overflow,
causing the loss of requests. Recent operating systems have larger queues
to avoid this overflow. However, when the queue is large and the cpu is not
fast enough, all cpu-power is needed to acknowledge incoming requests,
but none is left for the Web-server (software) to actually handle the
requests.
- Server hardware performance: Even the most powerful
computer systems may not be fast enough to handle the incoming number of
requests. Special purpose network interfaces have been developed that
distribute requests over a number of parallel machines. All the large
popular servers (like Netscape,
Alta Vista and
Playboy) use parallel servers.
Virtually all high-performance servers consisting of a single machine
are Unix-based workstations. They use one or more high-end cpu's
(Pentium-Pro, Pentium-II, Sparc, Alpha, etc.), a large internal memory
(often 1 Gbyte or more) and Raid disk arrays. (The disk subsystem
is almost always the limiting factor.)
- HTTP protocol: In HTTP version 1.0 every object
(file) is retrieved through a separate TCP/IP connection.
Documents with several images require many connections.
The popular browsers from Netscape
and Microsoft try to open more than
one TCP/IP connections in parallel.
This reduces the number of simultaneous users that can be served.
In HTTP version 1.1 a single connection is opened for a node and all the
images it contains.
- Web server software:
The first generation Web servers required the creation of a separate
process for every request. Creating (and destroying) a process takes a lot
of time (compared to the time spent on serving a hypertext node).
The second generation creates a reasonably high number of processes
upon startup to avoid this overhead.
The newest generation uses lightweight processes (threads) instead of
heavyweight processes.
It is important to make sure that a server is configured in such a way
that the maximum number of processes never consumes more memory than
the server machine has.
- Scripting:
CGI scripts
are used to provide access to databases and expert systems.
For this course text all pages are generated by scripts that monitor
the user's progress and adapt the content and link structure.
CGI scripts require a separate process to be started for each request,
and thus incur a similar overhead as the first generation Web servers.
Different server-side API's make it possible to eliminate this overhead.
For this course text Fast-CGI is used.
Besides these factors the perceived performance also depends on:
- Composite nodes:
When a node contains elements from different servers (for instance
images that reside on other servers) the user has the impression that
the server for the node is as slow as the slowest of the servers delivering
the images.
The Web inherits this problem from
Xanadu in which (textual)
citations would not be copied but are taken from their original,
through so called transclusion.
- Exponential back-off:
When a server does not acknowledge a request (fast enough) the client's
machine will repeat the request many times, but the time between successive
retries increases exponentially. As a result, faster response is sometimes
observed by interrupting the request and submitting it again.
- Stalling:
When a server is handling a (long) request, it may receive an overload
of requests. Acknowledging (but not yet handling) these requests has the
highest priority, leaving no processing time to serve more bytes to the
requests that are already being handled.