Distribution and Concurrency
Hypertexts can be distributed for several reasons:
-
The hyperdocument is too large to be stored on one machine.
-
There are too many readers (and writers) to be handled by a single machine.
-
The hyperdocument has many authors on different sites, having only
read access to the information on other sites, but having write access
to the part of the hyperdocument they create.
-
The hyperdocument consists of parts that are mostly read by users on
a single site, thus reducing network traffic by locating the information
near the readers.
Usually a combination of these reasons leads to the choice for a distributed
environment.
Since hypertext has been
defined as "a
database that has active cross-references and
allows the reader to "jump" to other parts of the database as desired",
a distributed hypertext shares properties and problems with the field
of distributed databases.
This is even more so in distributed hypertext systems for cooperative
authoring than in systems like the
World Wide Web
that generally only allow authoring on the local site (and reading from
all sites).
When a distributed hypertext is used mostly for reading, there are only
a few problems:
-
Reading should not be blocked while a node is being updated.
It is generally acceptable not to notify readers that the node has changed.
However, in cases like this course text updates are not always welcome.
Since it takes many hours to study this course, and since it contains
tests and an assignment which also take a lot of time, major updates while
students are actively participating in the course are not advisable.
For this reason, whenever a new version is introduced, the previous version
remains available for some time (at least six months).
Only minor changes like corrections to typos or changes to external URLs
are made without notifying the students.
-
World-wide distribution, as in the case of World Wide Web, poses serious
(potential and real) performance problems because a single node in the
distributed hypertext system may occasionally (or frequently) receive
many almost simultaneous requests.
Different techniques exist for reducing the problems with
unpredictable high load on
a single server.
-
Information about "who" has links to a node is usually not available.
This makes removing (local) nodes dangerous, because this may generate
dangling links.
HyperWave is a next-generation
Web-server architecture which does offer possibilities for avoiding
dangling links.
More and more traditional Web-servers are being replaced by HyperWave servers,
but there are not enough of them to validate that the HyperWave approach
to link consistency will scale up to the size of the entire Web.
Collaborative writing requires more
concurrency control features,
including locking, notification, transactions and versioning.
These issues are mostly related to concurrent authoring, and not as much
to distribution.
In order to verify whether you have learned enough from this section
you must complete a
test on distribution and concurrency
in hypertext.