Proceedings of the 2nd Workshop on Adaptive Hypertext and Hypermedia,
HYPERTEXT'98, Pittsburgh, USA, June 20-24, 1998

Content Adaptation for Audio-based Hypertexts in Physical Environments

Elena Not, Massimo Zancanaro
IRST/ITC - Cognitive and Communication Technology Division
38050 Povo - Trento -Italy
phone: +39-461-314444
{not,zancana}@irst.itc.it

Find a postscript version here (gzipped, 56k)

Abstract: The most important new issue emerging when allowing the fruition of a hypermedia repository of information while the user is moving in a physical space is the fact that information is presented in different situational contexts. Also, an additional perceptual dimension comes into play, providing stimuli, attention grasping and feedback. Emphasis should be put on integrating the perceptual experience with helpful information, without competing with the original exhibit items for visitor's attention.
In this paper we shall discuss some of the critical issues about content adaptation emerging in physical hypernavigation, presenting the approach adopted in the HyperAudio project.

Keywords: content adaptation, physical hypernavigation

Introduction: navigating a physical hyperspace

New hardware technology allows the fruition of virtual repositories of information while enjoying the physical space: for example, kiosks or portable devices may allow the access to a portion of a virtual information space relevant for the object in front of the visitor.

Figure 1: A visitor exploring an augmented room in a museum. (full size, 8k)

Figure 1 suggests one possible scenario of this kind, where visitors of a museum or an exhibition may enjoy a personalized tour through the interaction with the physical space augmented with an overlapped synchronized informational hyperspace (we will call this interaction context a physical hypernavigation). Each visitor is equipped with a palmtop computer endowed with headphones, on which an infrared receiver is mounted. Each meaningful physical location has a small (power-autonomous) infrared emitter, sending a code that uniquely identifies it.

Exploiting the infrared signals, the system is able to identify when the visitor reaches a certain physical location and can activate a relevant portion of the information repository loaded on the palmtop. Meaningful information are selected and organized to be played as audio messages or displayed on the palmtop screen. Adaptive and dynamic hypertext technology can be exploited to tailor a presentation according to the visitor interests, the actual context of the visit and so on.

We are currently investigating the features of this scenario inside the HyperAudio project [Not et al. 1997], a project IRST is developing in collaboration with the Civic Museum of Natural Sciences in Rovereto (Italy). The results gained inside HyperAudio will contribute to more advanced research efforts towards a richer interaction scenario, to be explored jointly with other partners inside the HIPS European project[*].

Navigating in a physical hyperspace is quite different from browsing an electronic book or a traditional (even adaptive) hypermedia running on a stationary console. The cognitive problems that may arise when a person is moving in a virtual information space (e.g., being lost in the hyperspace or the cognitive overload, [Conklin1987]) are different from those of a person moving in a real space, since an additional perceptual dimension comes into play, providing stimuli, attention grasping and feedback.

In this paper we shall discuss some of the critical issues about content adaptation emerging in physical hypernavigation. We shall particularly focus our discussion on the museum environment, which introduces additional stimulating challenges for effective content adaptivity.

Content Adaptation in a physical hypernavigation

The problem of adapting content for (cultural) information presentations in physical hypernavigation shares many features with the problem of producing adaptive and dynamic hypermedia for virtual museums (e.g. [Mellish et al. 1997]) or dynamic encyclopedias (e.g. [Milosavljevic et al. 1996]), as confirmed by many psychological studies made on museum visitors. For example: information should be stated in terms the visitor can understand (adaptation to expertise/knowledge level) and should help the visitor connect the new information to what he already knows ([Hood1993]); information should include references to visitor's interests ([Serrell1996]); content should be provided in a form that best stimulates learning on the part of the hearer ([Serrell1996]).

However, content adaptation in a physical environment poses some original problems which are related to the fact that the visitor is experiencing a ``real'' situation: moving in a real environment, looking for concrete objects to observe, and receiving perceptual orientation feedback. The peculiarity of inserting a hypertextual structure onto a physical space generates new ways of navigating information:

moving around the physical space, approaching the various cases, the visitor implicitly "clicks" on meaningful points of the hypertext (the system is able to track the visitor's position by means of sensors);
as in solely information spaces, the visitor may explore the sub-network of the hypertext nodes related to the physical object he is standing in front of; in addition he can proceed with the exploration within the physical dimension:
after getting information about the object, the visitor may decide to move in a direction that was explicitly or implicitly suggested by the message (for example because a comparative description was heard that introduces a new interesting and related object). But the visitor may also decide to suddenly change the suggested tour thread. Physical hints may attract his interest more than the proposed hypertextual links: he may be distracted by interesting objects close by or he may have personal intuitions about semantic relations between objects that make him stray from the undertaken path. The system can try to cope with movements that drop out from the hypertextual structure applying techniques coming from the area of dialogue modeling: for example, tracking topic shifting or modifying its assumptions on the user's preferences.

When assembling information presentations the system should take into account the prominence of the situational context, integrating the perceptual experience with helpful information, without competing with the original exhibit items for visitor's attention:

The system should provide information to help direct visitor's attention to and stimulate interest in the objects ([Bitgood and Patterson1993], [Boisvert and Slez1994]). This means that messages offered to the user should directly refer to what the user is seeing (also exploiting appropriate linguistic forms, e.g. deictic references as ``this object'' or ``the object on your left'') and should help the user to identify the object described (and its importance) among the others displayed.
The information should be preferably conveyed via audio messages, allowing the user to freely concentrate on the concrete objects.
The system should not overwhelm users with information ([Finn1985]), though providing opportunities for the interested visitor to easily find new and more detailed information on a subject ([Serrell1996]). Although this issue is relevant for presentation systems in general, it becomes crucial when the user is physically moving through museum rooms and standing in front of exhibits. In fact, physical tiredness might appreaciably affect user's attitude toward long commentaries, as well as his satisfaction and learning.
The system should integrate information with directions that help the user orient himself in the physical space (e.g. how to reach an interesting object, a friend, the exit, ...) and decide where to go next.
The system should adapt its behaviour according to a user model dynamically updated interpreting either the user explicit interaction (clicking on the palmtop screen) and his movements.

It may be argued that nowadays virtual reality technology allows the realization of virtual museums in which visitors navigate in a synthesized 3-D environment. Many of the issues above are relevant to this scenario too. However, the human computer interaction in augmented space is still substantially different at least for the presence of physical fatigue associated to movement.

Two dimensions of content adaptation: the user and the situation

In traditional adaptive hypermedia (either running on a stand alone computing station or accessed from a stationary console through the WWW) the most important adaptation factor considered is the user. The interaction context is quite constrained, with the user sit in front of the screen on which the system interface is displayed and with the following possible modalities to interact with the system: clicking on hyperlinks; typesetting information requests; gesturing (if suitable devices are available, e.g. a touch screen).

The most important new issue emerging when allowing the fruition of a hypermedia repository of information while the user is moving in a physical space is the fact that information is presented in different situational contexts. The main factors determining the situational context are (i) the user position and movements (whether he is in front of an object or whether he is simply walking around a room); (ii) the structure of the surrounding physical space; (iii) whether other people are examining the same item or not; (iv) whether the user came alone or not. Even though mobile computing nowadays allows to access hypermedia while the user is moving in the physical space (as in the case of a visitor walking around a museum and browsing on his palmtop computer the museum's web pages ), existing systems do not offer real dynamic adaptation with respect to the situational context.

A system effectively supporting physical hypernavigation should integrate the individual, dynamic modeling of the user (his knowledge, interests, goals, integrated with abilities, attitudes and preferences) with a general model of the environment, of user's movements and user's social context.

Importance of audio modality

In a traditional hypermedia, where the user is typically expected to browse around and read written information, though enriched with images and sounds, the system can hardly be sure that the user has really read (let alone assimilated [Mellish and O'Donnell1998]) the message. Permanence time associated to hypertext pages is usually not significant, unless some additional feedback is exploited (e.g. mouse moves).

When the audio output modality is available, combined with a position locating system tracking the user movements in the physical space, more information can be inferred about how the user is receiving the messages. For example, if the user is standing in front of the object currently described by the system and does not explicitly stop the presentation or does not move, the system could guess a high assimilation score.

But even if audio output could be a plus, we must be careful not to waste the positive aspects of this resource. In fact, if message content is not properly adapted to the audio modality the risk occurs of introducing an additional disorientation effect or dissatisfaction in the user. This phenomenon is more evident here than in traditional hypermedia because the user can not skip uninteresting or unsuitably tailored information as easily as in a hypertext.

HyperAudio Presentation Composer

In HyperAudio, information presentations are built by the system and are provided to the visitor whenever he reaches meaningful locations or when he explicitly asks for information. Each presentation is a structure containing (i) a sequence of audio files which will be played through the headphones, (ii) a set of relevant concepts worth of further elaborations which will be depicted on the display as clickable buttons, and possibly (iii) images related to the object or concept described and a properly oriented map displaying the visitor's current position.

The audio presentations are built concatenating precanned audio files selected from the informational space. In HyperAudio, the informational space is designed so that its contents and its structure can be used in an adaptive way. The information unit is the macronode (see fig. 2). Each macronode includes a network of audio files, a list of pointers to other relevant macronodes (including the particular rhetorical relations between them), the type of text (e.g., general introduction, detailed description, ...), a pointer to the relevant semantic concept in the ontology, and possibly a link to a physical location for which the message would be pertinent.

Figure 2: Sketch of macronode network (full size,5k)

More than one macronode can be selected to build the actual presentation. A macronode is atomic with respect to its content but it can have some optional parts that are selected only in some particular discourse contexts. For example, the macronode audio file network in figure 3 can be instantiated in the following messages:

when the feature deictic is selected (i.e. when the visitor is in front of the object being described): What you're seeing is the Spotted Salamander <PAUSE> it gets its name from its many yellow spots on its black or bluish black body;
when the feature deictic is deselected: The Spotted Salamander gets its name from its many yellow spots on its black or bluish black body;
when the feature PRO is selected (i.e. when the message has to be joined to another one with the same topic): It gets its name from its many yellow spots on its black or bluish black body.

Figure 3:The audio files network for the Spotted Salamander macronode
Figure 3: The audio files network for the Spotted Salamander macronode
(in the figure, each circle in the network represents an audio file or a command to the audio player). (full size, 3k)

An audio file network encodes lexical variations of the same message (that means that whatever instantiation you choose, the content of the text is the same). The task of instantiating a network performs in a simplified way some of the choices typically performed by the tactical component of an automatic natural language generator.

An important piece of information included in the macronode is the type of the message stored in the audio file network. Borrowing from [Serrell1996], we have defined the following message types occurring in the museum setting:

introductory labels: descriptions of exhibition goal and extent; they have to be slow enough to help the visitor get used to audio modality and must contain invitation to look on the screen to request further information and exploit the maps;
section labels: descriptions of rationale behind subgrouping of objects (including area descriptions or overviews and historic/social background); they can be played also while the visitor is moving;
captions: object descriptions; they should contain visual, concrete information and make use of deictic language;
follow-up information to captions: information related to objects (for example, general descriptions, anecdotes or similar); they are meant to elaborate information in caption messages;
way-finding and orientation messages: directions for reaching a physical location; they should make use of orientation hints (it is important a correct assumption on user position).

The type describes the purpose of the message and it is exploited in the decision process aimed at building communicatively effective presentations (as discussed below).

Another important information contained in a macronode is the set of macronodes semantically related to the current one. For each related macronode the kind of connecting relation is also indicated. At present, we are using a limited set of relations to capture the ways in which follow-up information to caption add information to a caption message: among others we consider elaboration-general for general new information, elaboration-part for part-whole descriptions (it is useful in natural science domains), elaboration-legend for legends, anecdotes, and so on.

How an adapted presentation is assembled

Each time the system has to describe a concept, it collects all the macronodes which refer to that concept and exploits a set of heuristics to decide what macronodes have to be discarded, what have to be played as the current audio presentation (and in what order) and what will be realized as textual anchors on the screen. Some of the heuristics are similar to those used by other adaptive systems, exploiting the user knowledge model (for example to select comparisons to already known concepts or introducing explanations for unfamiliar terms), the user interest model (to stress user preferred topics), discourse history (both to not select macronodes already presented and to introduce references to already seen objects)[Mellish and O'Donnell1998].

Other heuristics are more specific for the task of hypernavigation in physical space. For example, adaptation to the physical situation (discussed in section 2.1) can be addressed by:

(i) selecting the feature deictic whenever possible (in particular, for captions and follow-up to captions but also for section labels in order to enrich presentations with references to the physical environment);
(ii) for section label presentation: adding orientation messages to highlight interesting spots nearby (exploiting the user interest model [Sarini and Strapparava1998], and a model of the physical environment to compute distances between interesting objects);

Of course, the visual part of the presentation is exploited as well: in our current implementation, introductory labels and section labels have associated a map which is displayed on the screen and is always maintained oriented consistently with the user's current orientation, and way-finding messages are played on demand.

In order to maximize the communicative efficacy of the presentations, we have implemented a set of heuristics that constrain the ordering and the type of messages that are concatenated in a single audio presentation. For example,

in front of an object, only one caption can be selected for an audio presentation, and it can be followed by one or more follow-up to captions with the same associated concept (selecting the feature PRO);
when entering a new room or exposition area, only one section label can be selected, and it can be followed by captions and follow-up to captions preferably with the same associated concept (selecting the feature PRO); if needed the presentation can end with a way-finding and orientation message;
if way-finding and orientation message is the primary communicative goal of the presentation, it should not be followed by any other message.

The length of audio presentations should be carefully constrained: presentations based on captions should be short (relying on the visual anchors on the screen if the visitor needs more information), presentations based on section labels can be longer. Way-finding and orientation messages should be very short when associated with other messages while can be longer when they represent the primary goal of the presentation.

Conclusion

Even though we focussed our discussion on the museum setting, many of the critical issues discussed in this paper about content adaptation beyond the traditional stationary hypermedia setting do apply to any physical hypernavigation setting (e.g. being guided in tourist/cultural sites, in airports or in complex buildings).

We are currently designing a set of experiments with real users both in a laboratory setting and in the Civic Museum in Rovereto in order to test the validity of the proposed heuristics as well as the user acceptability of HyperAudio.

We wish to acknowledge the contribution of the other members of the HyperAudio team (Gregorio Convertino, Daniela Petrelli, Marcello Sarini, Oliviero Stock and Carlo Strapparava) to the discussion of the ideas presented in this paper. In particular, many thanks to Marcello Sarini and Carlo Strapparava for useful comments and suggestions on previous versions of this paper. Valuable input was also provided by the discussions with the other members of the HIPS consortium.

References

[Bitgood and Patterson1993] Stephen C. Bitgood and Donald D. Patterson. The effects of gallery changes on visitor reading and object viewing time. Environment and Behaviour, 25(6), November 1993. Special Issue on Environmental Design and Evaluation in Museums.

[Boisvert and Slez1994] Dorothy Lozowski Boisvert and Brenda Jochums Slez. The relationship between visitor characteristics and learning-associated behaviours in a science museum discovery space. Science Education, 78(2):137-148, 1994.

[Conklin1987] Jeff Conklin. Hypertext: An Introduction and Survey. Computer, Survey & Tutorial Series, 1987.

[Finn1985] David Finn. How to visit a museum. Harry N. Abrams, Inc. Publishers, 1985.

[Hood1993] Marilyn G. Hood. Comfort and caring: Two essential environment factors. Environment and Behaviour, 25(6), November 1993. Special Issue on Environmental Design and Evaluation in Museums.

[Mellish and O'Donnell1998] Chris Mellish and Mick O'Donnell. An architecture for opportunistic text generation. In Proceedings of the 9th International Workshop on Natural Language Generation, Niagara-on-the-Lake, Ontario, Canada, August 1998.

[Mellish et al. 1997] C. Mellish, J. Oberlander, M. O'Donnell, and A. Knott. Exploring a gallery with intelligent labels. In Proceedings of the Fourth International Conference on Hypermedia and Interactivity in Museums (ICHIM97), Paris, September 1997.

[Milosavljevic et al. 1996] Maria Milosavljevic, Adrian Tulloch, and Robert Dale. Text generation in a dynamic hypertext environment. In Proceedings of the 19th Australasian Computer Science Conference, Melbourne, Australia, 1996.

[Not et al. 1997] Elena Not, Daniela Petrelli, Oliviero Stock, Carlo Strapparava, and Massimo Zancanaro. Person-oriented guided visits in a physical environment. In Proceedings of the Fourth International Conference on Hypermedia and Interactivity in Museums (ICHIM97), Paris, September 1997.

[Sarini and Strapparava1998] Marcello Sarini and Carlo Strapparava. Preliminary notes for a user model in a museum exploration and information-providing adaptive system, 1998. IRST Internal report.

[Serrell1996] Beverly Serrell. Exhibit Labels. AltaMira Press, 1996.

Footnotes:

...project: The HIPS consortium includes: University of Siena (coordinating partner), CB&J (France), GMD (Germany), IRST (Italy), SIETTE-Alcatel (Italy), SINTEF (Norway), University of Dublin and University of Edinburgh.