Lagoon, a WWW cache
There are two ways of providing local copies of hypertext nodes from
different sites in a distributed hyperdocument:
-
mirroring: a well-defined part of the hyperdocument, from a single site,
is copied to the local site. The local site thus contains a "mirror
image" of the part of the hyperdocument from another site.
Such replication is common in distributed databases.
-
caching: nodes that are requested are not only copied to the local site
when requested by a reader but are stored there (for a while).
This technique is common in computer hardware and in operating systems.
In order to use mirroring one needs an accurate way to predict which part of
the distributed hyperdocument is most interesting to copy.
For applications like a search algorithm, mirroring will not be very effective,
as links to documents on different sites are favored by the
fish-search for instance.
Also, in order to keep a mirrored part of the hyperdocument up to date
some cooperation of the originating site is needed to "send" new versions
of nodes and links as they are created on that site.
Lagoon [BP94]
is a cache for the World Wide Web,
which operates independently from both the client
(WWW-browser) and the server.
Some browsers, including Netscape Navigator
and Microsoft's Internet Explorer
also offer caching. However, the disadvantage of such a "client cache"
is that it is not shared between different users.
Currently the lagoon cache can be configured in two ways:
- For each document the expiration date (or time) needs to be guessed,
when not included in the document header. Different expiration times are taken
for documents with different names.
Pages from a Teletext server should be refreshed much more frequently than
pates from this course for instance.
Nodes are refreshed only when a reader wishes to read them.
In case the remote site cannot be reached Lagoon can still serve the old
version.
- For documents from different sites separate caches can be used.
Lagoon-caches can thus "call" each other to avoid direct access to different
(non-caching) servers. Again, the names of the documents (the URLs) are used
to determine whether or not to redirect requests to another lagoon server.
The most popular WWW-cache at this moment is
Squid,
a freely available cache which
has its roots in the original WWW-server from CERN.