From Text to Hypertext

Converting existing textual material into hypertext consists of the following steps:
  1. Splitting the text into portions, which will become nodes, is usually done based on structural divisions of the text: sections, subsections, footnotes, examples, etc.
  2. Link structures can often be derived from existing sources like an index, a table of contents, etc. Also, links between text and footnotes and examples can easily be generated.
  3. Objective links can be added, like links from references to theorems or examples to the actual theorems or examples, from the name of a city to a map, etc.
  4. Subjective links are added by the (human) converter who feels they are relevant. Subjective links are located in concepts seen by the converter as being associated.
Links can also be generated between nodes that are similar, i.e. that have similar relative frequencies of meaningful words. A test by Bernstein [Bernstein-90] showed that an automated generator of hypertext links can easily achieve a 95% accuracy in generating meaningful links. One problem with these generated links is in choosing appropriate anchors in the source nodes for these links. Another problem is that this technique may lead to a wealth (and thus a mess) of links. Usable hyperdocuments require a balance between hierarchical and cross-reference links. (See the section on browsing experiments.)

An inherent danger in converting text to hypertext is that the information will be subtly altered by the new structure or by the new way of connecting things. This is similar to the problem of "colorizing" old black-and-white films. The color adds information that may not be what the original author intended.

Examples of texts that have been converted to hypertext are the (special) issue of the journal Communications of the ACM on hypertext, the Manual of Medical Therapeutics (a 500 page book), and the Oxford English Dictionary, a 570 Megabyte text. The CACM issue was converted to different formats, including KMS. This proved difficult because of the fixed frame size in KMS. Using the OED as hypertext is even more difficult as the size of entries varies from 50 characters to over 500.000 characters.

Sometimes it is necessary to convert hypertext (back) to linear text, in order to print it on paper. Hyperdocuments with a hierarchical structure can be easily converted. A typical example is the Hypertext Hands-On! book by Shneiderman and Kearsley [SK89], which is delivered as a book and a hypertext at the same time. Many documents on the World Wide Web are also essentially linear, providing links from parts to other parts, thereby simulating a non-linear structure. This course text however cannot be trivially converted to a linear sequence of pages (at least not to a sequence in which every node occurs only once).