E-mail {debra,wsinatma}@win.tue.nl
Many WWW servers contain information written by several authors. These authors either need an account on the server machine, and special permissions to create information in the server space, or else the Webmaster needs to put the information in that space or allow the server to point to the author's own space.
We present DReSS, a system to enable authors to deposit (and update) documents on a WWW server, using standard WWW features only. Authors do not need login permission on the server machine, ftp upload access, or even electronic mail. As the documents live in the WWW server space there is no need for the server to be able to access documents outside its space. Thus, our system will work on even the most securely shielded servers (running in a chroot environment).
DReSS consists of a set of CGI-scripts and two small auxiliary programs running on the client machine. It can be used with any (HTML-2.0-capable) WWW browser, and with any WWW server. DReSS does not use special features of a specific browser or server, and does not require any modification to the browser or server. For example, it does not use the HTTP PUT method, mostly because not every browser and server supports it. It does not use multi-part mime documents, file inclusion in forms, or server push features. It also does not use protocols (like ftp or smtp) other than HTTP.
We indicate where the current WWW architecture makes managing WWW servers with multiple authors difficult. This leads to suggestions for new browser and server features that could improve the authoring process significantly.
The World Wide Web (WWW or Web) [Berners-Lee94] is an open, distributed information system that lets groups of people share information. Most WWW servers contain documents written by several authors. Typical examples are departmental servers in research institutes and universities. They contain personal homepages, scientific papers, descriptions of departments and research programs, and often also courseware. Recently an increasing number of Internet providers for the general public started offering the possibility to put personal home pages on their WWW server.
The focus of the Web is on providing read-only access to information. Support for authors has been limited mostly to graphical (sometimes incorrectly called WYSIWYG) HTML editors, such as SoftQuad's HoTMetaL, and to add-ons for word-processors, such as Microsoft's HTML-assistant. (See [Carl-Davis] for a nice overview.) The problem of moving the created documents to the WWW server has been largely ignored until recently. In [Lavenant-94] a distributed hypermedia authoring system is presented that uses special-purpose WWW browser/editor extensions, and an invented "CTRL" non-HTML tag. In [Pitkow-95] a document repository system is described, which is extended with a versioning mechanism, a hyperlink database, HTML-verification and other features. This system uses a modified WWW server in order to do authentication, and uses a proposed form-based file uploading [Nebel-95], not supported by most WWW browsers. In [Bentley-95] the BSCW system is described, which supports so-called Basic Support for Collaborative Work. BSCW provides a document repository service with concurrency control and versioning. It requires a minor modification to the WWW server, and a helper application on the client. In [Andrews-95] the integration between WWW and Hyper-G is described. Hyper-G has been turned into a very appealing and powerful system, which can be accessed through WWW browsers and can contain HTML documents. In doing so it provides much more than a document repository system for WWW authors.
In a departmental server environment or the WWW server of an Internet provider, the main reason for using a document repository system is to avoid the need for authors to have login permission on the server machine. Features like link databases, versioning, on-the-fly generation of indexing information and others are worthwhile, but secondary to the goal of providing a document repository system which is easy to install and to use, and which can be used with different browsers (and servers) without requiring any modifications to "standard" WWW software. In this paper we introduce DReSS, a tool that turns a WWW server into a Document Repository Service Station, enabling authors to move documents to the WWW server, and to update documents on the server, without compromising the server's or client's security. To ensure that the security of the server and author's machines is not compromised by using DReSS the set of requirements given below contains several constraints related to security:
This paper is organized as follows: in Section 2 we introduce the problem of employing multi-author WWW servers and show some limitations of the current Web standards that make the authoring process and repository service difficult. In Section 3 we describe the authoring process and repository service using DReSS. We show how DReSS circumvents the difficulties mentioned in Section 2. Section 4 discusses issues related to collaboration, concurrency, versioning, authorization and network security.
Authoring a document typically goes as follows: the author generates a document using a word-processor (or other kind of editor), and decides where to store the document. Later revisions are done by retrieving the document using the word-processor and saving it once the editing is done. Pitkow and Jones [Pitkow-95] distinguish between:
The creation of new documents is even more problematic. After loading the document into the WWW browser as a local file one would like to assign a valid URL to it and then send it to the server. Changing the URL of a loaded document is not possible in the current generation of browsers. Changing the URL could be done using a form, but including the document in the same form is also not yet possible in most browsers.
In order to implement editing sessions in the stateless WWW we need extra information on both the client and the server side to remember which client is engaged in which session. On the server side files generated by CGI-scripts can be used for this purpose. On the client side the only way to make the browser remember something (without showing it to the reader) is to put it in a hidden field in a form. This hidden information is automatically sent back to the server with the next request. By sending this information back and forth the server can associate each request to the appropriate session.
When the editing is done the problem arises of how to send the document back to the WWW-server. A common approach is to connect the client and server machines in such a way that (a CGI-script on) the WWW-server can simply read or even serve the document from the author's home directory (or a subdirectory thereof). To enable this features, servers translate the "~user" directory name to a "public" subdirectory of the user. Another way to generate a similar effect is to create (symbolic) links from the server space to other (nfs-mounted) directories. Serving document from user directories has a number of serious drawbacks:
A final range of problems with Web authoring is authorization. When we get the server to actually receive the documents from the authors it will install them in whichever directory the author requests (and is authorized to). Once this is done the documents become files owned by the server. Thus, a separate database is needed to remember the names of the authors and the access rights selected by the authors for their documents. With HTML documents it would be possible to store that information as meta information in the header, but with other files, like images and sound fragments, this is not possible. Structured files can be used to implement this session database. Using a real database system has its advantages, but makes porting the repository system more difficult.
When an author wishes to edit (or create) a document the server has to determine the identity of that author. Currently the Web provides no secure way to verify an identity. Like for read access, username/password combinations can be used, but WWW browsers do not send passwords over the Internet in a secure way. Passwords are encoded but not encrypted. They may be intercepted. (For telnet and rlogin connections the situation is even worse, but largely ignored by most Internet users.) An additional identity verification is possible using the RFC931 protocol. Some WWW servers offer the possibility to verify the identity of the sender of an http request by contacting the RFC931 server on the sender's machine. However, that client machine may have a bogus RFC931 server that lies about the identity, or even no such server at all. The http protocol makes it also possible for the WWW browser to send the requester's identity along with the request. Most browsers do not support this feature, but even if they did this identity is easily faked by an intruder. Making password communication secure in the WWW goes beyond the scope of this paper. It is needed for read access to restricted WWW documents as well as for authoring and publishing. While we cannot (yet) make the communication between the authors and the WWW server secure, we must do our best to restrict the impact of a successful attack to only the documents on the WWW server.
The DReSS system supports creating new documents and updating or deleting existing ones. It can transparently restart a session after (accidentally or deliberately) exiting the WWW browser. We will concentrate on the creation and update problem. Deletion is fairly straightforward. The creation of a new document and the modification of an existing one go as follows:
The DReSS startup form, with example input is shown in figure 1. (It is normally preceded by a glossy banner identifying the WWW-server for which DReSS offers its repository service.) It lets you select whether you want to create a new document or modify or delete an existing one.
DReSS will allow only one user to create, edit or delete a document at the same time. Therefore, the URL of the document (as entered on the startup form) can be used as a session identifier. In the example the URL of the document to be created or updated is only partial: it is not necessary to specify the protocol and hostname, since only the http protocol is supported and the current (initial) version of DReSS can service only one WWW-server, which must be the one containing the startup form. If a full URL is given, only the "path" part is used for the session id.
DReSS allows only the creator of a document to change the access rights. The default action for existing documents is not to change the access rights. For a new document the default is to let everyone read the document and allow only the creator to update and delete the document.
After pressing the "Create Document" or "Modify Document" button a second form is displayed, called the action form, containing the EDIT, COMMIT, VIEW and ABORT buttons. A CGI-script on the WWW server creates an object (a file) which is associated with the session (i.e. with the document's name), and which contains all the information about the document, as given in the startup form. It also checks out the document, disabling other users from editing the same document until a commit or abort is performed. The same user is still allowed to create a new session for the same document. This provides a "silent" way to resume a session that was interrupted because the WWW browser was exited. The generated action form contains the session id (document name) in a hidden field. It also contains the author's identity and encoded (but not encrypted) password, both also in hidden fields.
Getting an existing document into the appropriate editor on the author's machine is done by means of a special mime type. The author has to bind this mime type to a so called external viewer which in this case is a small wrapper program that moves the document to the desired place (the local pathname given in the startup form) and starts the appropriate editor. In case the document is to be created, not updated, the editor is started with the local pathname, which can be an existing file or a name of a document which is still to be written. Note that binding mime types to external viewers is usually done in the user's mailcap file, but the Webmaster may decide to do the binding in a system-wide mailcap file. Different mime-types can be used (invented) for calling appropriate editors for different document formats.
Because the EDIT procedure is implemented using a special mime type the WWW-browser does not alter its display after invoking the external viewer. It still displays the EDIT, COMMIT, VIEW and ABORT buttons.
The EDIT button is only present because most WWW browsers do not understand multipart mime-files. After completing the startup form, a reply containing the action form as one part, and the document as another part would eliminate the need for an EDIT button. The document would be immediately passed on to the appropriate editor.
When the author is satisfied with the contents of the document (and has saved it on the local machine, in the specified local pathname) the document needs to be transmitted to the WWW server. The browser cannot perform this task itself, since it has passed on the document to the editor, and has subsequently forgotten about it. In the future some browsers may have the ability to load a local file, assign a URL to it and then send it to the WWW server (possibly using the HTTP PUT method). For now we will assume that an auxiliary program is needed to perform this transfer. In any case the initiative for the transfer must come from the author's machine, because the server has no rights to retrieve any information from the client's machine.
Activating the transmission program cannot be done directly by the browser. Most WWW browsers do not have the ability to start an external program on the client machine by pressing some button. We circumvent this shortcoming by binding the COMMIT button to a (link to a) CGI-script on the WWW server that generates a tiny document of yet another new mime type, which the author has to bind to the auxiliary transmission program (again, in the author's mailcap file, unless the Webmaster has added the mime type to the system-wide mailcap file). This CGI-script gets the session id, author id and password from the browser, because they are contained in hidden fields of the form containing the COMMIT button. Hence the script can associate the COMMIT request to the correct session, and verify that the session indeed belongs to this author, and also that the correct password was supplied.
The transmission program on the author's machine gets the session id, local pathname, author id and password from the CGI-script and constructs an HTTP POST request, containing the session id, author id and password for verification, followed by the document itself. This POST request activates the commit CGI-script which first verifies the session id, author id and password, and then checks in the document on the server (i.e., moves it in place and enables writing by others again).
The complicated procedure invoked by pressing the COMMIT button can be easily avoided by invoking the transmission program from the wrapper that activates the editor. However, this would violate the requirement that each action must be initiated from the browser. Also, the author may wish to abort the session after editing, i.e. to not update the document on the server after all. Sending the document back to the server unmodified may not be acceptable as a no-operation, in case the server is running an automatic versioning system that records every update request. Also, the changed modification date on the server would suggest that the document was altered, while in fact it was not.
In case the session is aborted instead of committed, an abort CGI-script is called which cancels the checkout (i.e. which simply removes the write-lock). Here an HTTP UNLOCK request would have been useful if available. In the implementation of DReSS the commit and abort CGI-scripts are actually the same.
The transmission program can operate silently and invisibly in the background. As a consequence the author does not know when the commit is actually completed. This is important in a PC and modem environment where the author may wish to power down the PC or to disconnect the modem after the commit. Pressing the VIEW button activates a CGI-script on the server that returns the document if the commit is completed, and a warning message (and another VIEW button) if the transmission is still going on.
It would be nice if the document were simply shown by the browser as a result of pressing the COMMIT button. After transmitting the document the commit CGI-script could trigger the server to send the document to the browser without the browser actually asking for it. Such a process is known as the experimental server push procedure. DReSS doesn't use this because it is not standard.
Figure 3 below shows the entire communication between the author's machine and the WWW server, for the case where an existing document needs to be updated. While this communication looks (and is) complicated, the dialog between the author and the system is really very simple. (For color readers: the author's actions are written in blue.)
When DReSS is used to generate documents on a WWW server all documents are owned by the same account (determined by the server configuration file). DReSS maintains a database to remember the owner and access rights for every document. In the initial implementation the database contains a separate file for each document. By inspecting that file a CGI-script can find out whether a document is checked out (locked) or not. Because near-simultaneous attempts to start more than one session on the same document will probably not occur frequently simple file locking on the session files is used. Updates to the session files only take a fraction of a second, causing very little delay when a CGI-script has to wait in order to find out whether a document is checked out (locked) or not. If the document is locked DReSS will issue a message telling the user which author has checked out that document.
DReSS is intended for multi-author WWW servers. Authors may not be working on the same document most of the time, and when they are, they will probably not find documents locked often, because of the hypertext nature of the Web. The basic hypertext principle is that authors write their own small documents and create links to each other's documents. The large number of small documents together form a hyperdocument. Thus, the problem of coping with multiple authors is less a problem of dealing with concurrency and locking than a problem of ensuring that the documents and their links together form a sensible hyperdocument. In order to avoid dangling links (links pointing to documents that do not yet, or no longer exist) DReSS needs to disallow the deletion of a document as long as at least one other document (on the same server) points to it. Also, it needs to give warnings when a new document is created containing dangling links, or when new dangling links are introduced by modifying a document.
The "Intelligent Publishing Environment" of Pitkow and Jones [Pitkow-95] allows the deletion of a document when there are still links to it (within the same WWW server). It automatically removes the links from these other documents. We consider this behavior unacceptable in general. The user removing a document may not have permission to alter these other documents. Also, the modification that is needed in these other documents for them to still make sense may be much more difficult than the simple removal of a link. This modification has to be done by the authors ot these documents.
Collaboration between authors working on the same document generally leads to more modifications, sometimes cancellations, than documents written by a single author. In order to be able to quickly undo changes that were committed, a versioning or source-code control system can be used. The initial implementation of DReSS does not yet support versioning. This feature will be added by employing the RCS system.
DReSS follows an unusual approach towards security. The Web only provides basic security, which means that user identities and passwords can be used, but passwords are not transmitted in encrypted form over the Internet. As a result, it is currently not possible to make DReSS secure in the sense that the documents on the WWW server are well protected and that sessions cannot be broken into by intruders. For this reason the passwords used by DReSS should never be the same as the passwords that author's use to login onto any computer.
The main security issue for DReSS is making sure that the computing
environment of the author is not affected by the vulnerability of the
WWW server. When a server is configured carefully, its only risk is that
an intruder might alter the documents on the server. By shielding a
WWW server in a chroot environment it becomes absolutely
impossible for the WWW server to access, let alone alter any file
outside the WWW server space. Server protection can be further enhanced
by not offering any inherently dangerous programs in that shielded
environment, such as a shell and the popular Perl interpreter.
The WWW server at our department (www.win.tue.nl
) runs in
such a shielded environment.
Both the NCSA and CERN servers run on the same document tree.
Even though our site is a popular target for would-be crackers,
none of the security problems discovered in WWW software have been
successfully exploited by crackers attacking our server.
But even if they would have been, only the files visible to the WWW server
could have been altered.
In order to make DReSS usable in such an environment, all CGI-scripts
are written in C.
The protection of the author's machine is first guarded by delegating all initiative in DReSS to the author's machine. The WWW server never tries to contact the author's machine unless the author specifically asks it to do so. This is another good reason for not using the experimental server-push feature. Apart from sending information to the author's browser, there are only two other operations that might threaten the author's machine: the two auxiliary programs that are activated upon request by the author, but through the server. A bogus, cracked server, could try to trick the auxiliary programs into performing unauthorized actions. Since these programs reside on the author's machine, not on the server, the programs themselves cannot be modified by an intruder who has cracked the server. The binding between mime-types and "external viewers" is also done on the author's machine, so the server cannot trick the browser into executing different programs. The auxiliary programs may be tricked into undesirable behavior however:
C:\TMP
for instance.
ftp://ftp.win.tue.nl/pub/infosystems/www/dress/
.
The initial implementation concentrates only on the document repository aspect.
Future extensions will include link verification (possibly done by
means of MOMspider [Fielding-94] or by EIT's tool
[McGuire-95]) and version control (probably using RCS
[Bentley-95] best resembles DReSS.
BSCW offers version control and a graphical interface to the shared
workspaces, which DReSS doesn't.
BSCW modifies the URL's provided by authors, making it difficult to predict
what the URL of a document (still to be created) will be.
DReSS preserves the URL's given by the authors. This property, and the
possibility to upload an entire directory at once, make DReSS better suited
for publishing hypertext documents consisting of many small text fragments
that are linked together.