Objects linked to from many places in
navigation structures emphasises
such objects. Such objects can be central nodes that act as home or
start pages of, for example, web sites such as portals. Furthermore,
objects that have links to other objects have emphasis with respect to
objects without links to other objects. In addition, names of links can
express emphasis since they can contain an identification or annotation
of the link. These techniques are all supported by the hyperlink
construct in the HTML standard.
The classical type of technique for
emphasising objects is highlighting
them in order to give emphasised objects distinct presentation
characteristics with respect to other objects. Techniques for
highlighting are setting objects size, use of different fonts and
colours, flashing objects, use of icons such as arrows and frames
around objects. A feature such as frame size conveys the intensity of
the emphasis, and colour possibly the type of emphasis.
Style features such as colour are applicable to individual objects and
do not inherently constrain the presentation of other objects. Style
features allow addressing individual objects for emphasising. However,
possible unintended effects of a combination of style features in
presentations should be avoided, while in addition style can affect
presentation of information and its presentation structure [16]. As
examples, background colours should not mask colours in the media items
or conflict with them, and application of many different fonts may
inhibit readability.
6. USER CONTROL
In presentations of hierarchical
structures, a meaningful sequence of a
set of concepts that are subsumed under a concept provides readers with
a means of relating the subsumed concepts to each other according to
the applied sequence criterion. If the sequence criterion is relevance
of objects, readers reading the sequence from beginning to end
encounter each of the concepts in the sequence before all other
concepts that are less relevant. A difficulty is that it is hard to
tell beforehand what makes concepts relevant for users. We base the
sequence of concepts subsumed under a concept on relevance for users,
and consider a number of criteria that are optional relevance criteria
for individual users. These criteria are the portion of the retrieval
result covered by the objects, the amount of information available
about the objects, and the relevance of the available information for
individual users. We now explain how these relevance criteria relate to
characteristics of concepts.
The first relevance criterion mentioned, being the portion of the total
number of retrieved objects in concepts, is proportional to the number
of objects in concepts. Consequently, we consider the number of objects
in concepts as a measure of the concepts relevance.
The second relevance criterion, being the amount of information
available about the objects, is proportional to the number of
attributes of concepts. Consequently, we consider the number of
attributes of concepts as another measure for the concepts relevance.
The third relevance criterion, being the relevance of the available
information for individual users, requires that a specification of the
relevance of attributes for users be available. Since users goals vary,
different users consider different attributes as relevant. A way of
letting users specify relevance of attributes is by requesting an
assignment of positive numbers to attributes as relevance weights, such
that higher numbers correspond to higher relevance of attributes. Since
higher numbers of the previously mentioned relevance criteria also
correspond to higher levels of relevance, a measure of the total level
of relevance of a concept that follows from the three individual
relevance criteria can be calculated according to the following formula.
In this formula, Rconcept is the
relevance of the concept, Nobjects is
the number of objects in the concept, Nattributes is the number of
attributes in the concept and Wi is the weight of attribute i in a set
of Nattributes attributes.
Multiplying the number of objects with the sum of the weights assigned
to the attribute types results in their having equal effect on the
resulting concept relevance, without their having to be of equal order
of magnitude. Adding the number of objects to the sum of weights
results in their having equal effect on the outcome only if they are in
the same order of magnitude. Since the order of the number of objects
in concepts generally increases with the total number of objects in the
database, this would entail a need to bring the weights into accordance
with the number of objects in concepts.
A set of weight values containing the values zero and one only allows
users to designate attributes as either relevant or irrelevant without
further distinguishing between the relevance levels.
The formula shows the calculation of the concept relevance when all
three relevance criteria mentioned are involved. Leaving some of the
relevance criteria aside requires adjustment of the formula. Excluding
the first relevance criterion, being the number of objects in concepts,
implies that the factor Nobjects must be removed from the formula.
Excluding the second relevance criterion, being the number of
attributes in concepts, implies that an additional division by the
number of attributes in the concept must follow calculation of the
resulting Rconcept. Excluding the influence of differently valued
weights implies that the number of attributes Nattributes replaces the
summation factor.
A sequence of presentation of siblings in decreasing order of concept
relevance in hierarchical conceptual presentations results in readers
encountering siblings in decreasing order of relevance. Emphasising
concepts with relevance levels that exceed a certain threshold level,
such as zero, allows users to identify the objects in presentations
with the specified relevance level at a glance.
The Topia architecture puts siblings in hierarchical conceptual
presentations in a sequence according to relevance as explained in this
section. With their query, users specify the level of relevance of the
types of attributes that occur with the retrieved information objects.
Figure 5 shows the specification form. The form shows the weights as
well as a direction for users for applying the weights. Users specify
one of six levels of relevance for each of these attribute types, or
tick the extreme left column for specifying attribute types that should
not be included in the presentation.
Attributes in the Topia repository have a type and a value. Topia
allows user specification of the relevance of only the attribute types
that occur in the retrieval result. Conceptually, users could as well
be allowed to specify weights of attribute values. However, attribute
types typically have many attribute values, resulting in a large amount
of attribute values that occur in retrieval results. Letting users
specify relevance for all of these requires considerable efforts. RDF
encoded databases allow automated extraction of the attribute types and
values of retrieved objects.
Figure 5. User specification of
relevance of attribute types
To process the specified levels of
relevance, they are assigned the
integers from zero to five. Higher numbers in this range correspond to
higher relevance levels, as shown in the table. The values of weights
in the form are illustrative and not critical for a good performance of
the sequence principle. In fact, users could be allowed to specify the
weight values freely, allowing users to apply a weight distribution
different from a set of successive integers. For calculating relevance
of concepts, the Topia architecture applies the mentioned formula in
order to involve all three stated relevance criteria.
A hierarchical list of concepts conveys the retrieval results, as
described in section 3. Since people read from top to bottom,
presenting the sequenced concepts from top to bottom requires a
presentation device, so that at each hierarchical level, users
encounter concepts in decreasing order of relevance. Emphasised
concepts, with a relevance exceeding the threshold level, appear as
blue links, while the non-emphasised are ghosted out.
Section 7 shows that sequences of siblings according to relevance of
clusters for users as explained in this section allows focalisation of
presentations to specific points of view.
7. DIRECTING DISCOURSE
Section 6 showed that sequence and
emphasis in presentations position a
set of information objects at a point in the story space. For users to
obtain discourse that shows specific perspectives of sets of
information objects requires an appropriate statement of relevance of
attributes. This section shows how a statement of relevance of
attributes results in discourse that give a corresponding perspective
of a set of retrieved information objects. The discussion focuses on
one of the relevance criteria only, being the attribute weights, since
it is the only relevance criterion that relates to the contents of
information objects.
Increasing the weights of specific attributes moves concepts with these
attributes to the front of sequences they are part of, allowing users
to encounter such concepts first. Consequently, in order to put
discourse in perspective, attributes that are characteristic of the
required perspective must have higher weights than others in order to
give the corresponding clusters high relevance. To illustrate this with
the Topia architecture, we consider a user who wants artefacts about
the theme water and specifies a query “water” in the artefact title.
Among useful perspectives for readers of the retrieval results are the
perspective of the art domain on the one hand and the perspective of
time and place on the other hand. Considering the attributes that occur
in the retrieval result at the extreme left in Figure 5, the following
weight configurations are in accordance with the two perspectives.
1. Perspective of art domain: attributes artist, genre and material
have weight value 1, other attributes have weight value 0.
2. Perspective of time and place: attributes place and year of creation
have weight value 1, other attributes have weight value 0.
Figure 6 shows a presentation in the art domain perspective resulting
from the weight configuration stated in item 1. Concepts that have
attributes of type artist, genre or material appear above other
concepts in the presentation sequence.
Figure 7 shows a presentation in the perspective of time and place
resulting from the weight configuration stated in item 2. Concepts that
have attributes of type place or year appear above other concepts in
the presentation sequence.
In addition to users themselves, discourse domain experts can be
involved in specifying the weight configuration of attributes for
discourse with specific perspectives. Dynamic RDF encoded databases do
not allow retrieval of an up-to-date set of attributes of information
objects before the time of retrieval. Consequently, it is not known
beforehand what attributes are available, which of the attributes
relate to the required perspective and how they should be weighted to
ensure a proper position of objects and attributes in the resulting
discourse of the required type. A classification of attributes in the
repository gives discourse domain experts a means for specifying the
relevance of classes of attributes in presentations with specific
perspectives.
Figure 6. Discourse in
perspective of art domain
Figure 7. Discourse in
perspective of time and place
8. FUTURE WORK
The work presented in this paper
bases automatically generated sequence
and emphasis on the relative number of objects and attributes of
concepts and on relevance of attribute types for individual users.
Another domain-independent criterion for sequences and emphasis is the
subsumption structure in concept lattices. The subsumption structure
occurring in concept lattices depends on the occurrence and
distribution of attributes among the retrieved information objects.
Sequences of concepts can be based on their number of child concepts or
parent concepts, while emphasis on concepts can be based on a high
number of parent concepts or child concepts. Analysis of concept
lattices reveals the presence of distinct structures such as central
concepts or intensively interconnected clusters of concepts, which can
be emphasised. Presenting such relevant and prominent characteristics
of concept lattices by means of discourse constructs to convey patterns
in the retrieval result will be a topic of future research.
Topias current implementation
generates concept lattices based on exact
match of attributes of information objects. Extension of the exact
match criterion with measures based on proximity of attributes can
potentially increase the number and quality of clusters. Clustering
techniques exploiting proximity of attributes have found their
application in data mining for partitioning sets of objects [5]. The
type of clustering technique determines the properties of the resulting
clusters and hence the type of coherence among objects in clusters. In
order to let users experience the objects in the resulting clusters as
semantically close, the required distance measure between attributes
for clustering should be accordingly. In spite of the required tuning,
density-based numeric clustering techniques take the distribution of
numbers in the retrieved data set into account for generating clusters
of objects with relatively small numeric distance between the objects.
Such techniques can be particularly useful for clustering numeric
properties, such as the year of creation of artefacts.
Vector space models of information objects in an attribute space have
found common application to express similarities between information
objects for information retrieval purposes [21]. Vector space models
are a conceptual basis for clustering objects based on non-numerical
attributes and for calculating clusters similarity to user queries.
Discourse constructs such as sequence and emphasis can express such
cluster characteristics in presentations. Future work will extend the
applied clustering techniques and focus on their presentation in
discourse constructs.
Another application of sequences is for conveying themes as threads
through concept lattices. Such themes can concern subsequent clusters
of attributes that have a specific identical attribute, but that do not
occur under the same concept in the concept lattice. The user statement
of relevance of attributes can be extended to a user statement of
themes to be presented as paths along subsequent clusters in
presentations. We will focus on automatic generation of such themes by
means of sequence and emphasis and possibly other discourse constructs.
RDF databases are flexible because of their support for integration and
inference rules without having to redefine the database structure.
Consequently, attributes that occur in retrieval results cannot be
determined earlier than at time of retrieval. It will be interesting to
think about development of semantic structures that let domain
discourse experts specify generation of perspectives of presentations
by means of discourse constructs, in the absence of an exact knowledge
of the attributes that occur in retrieval results.
9. SUMMARY AND CONCLUSION
This paper focuses on the automated
derivation of two discourse
constructs, being sequence and emphasis, from semantic annotations. The
results of this work are a continuation of the Topia project, which
generates discourse structures from clustering of semantic annotations.
Other approaches focus on human-authored narrative templates for
specifying sequence and emphasis. We present requirements for automated
domain-independent generation of sequence and emphasis in the four
phases of our processing chain, being analysis of semantic annotations,
clustering, discourse structure generation and hypermedia generation.
We also present an overview of the support that web standards,
including the Semantic Web standard, offer for this. Principles for
discourse generation that are independent of specific domain semantics
allow automatic generation of narrative presentations from the contents
of multiple repositories in web environments, irrespective of their
application field.
Domain-independent criteria for sequence and emphasis follow from two
sources of information. First, such criteria can be derived from
attributes of information objects. Hard-coded sequences, numerical
attributes and chains of information objects with identical relations
between subsequent objects are sequence criteria that can be derived
automatically. The occurrence of relatively large clusters of
information objects that have identical attributes is a criterion for
emphasis, as well as occasional attributes of objects with respect to
those of other objects. A second criterion for sequence and emphasis is
relevance of information objects for individual users. We present a
relevance criterion that takes both types of criteria into account. The
latter, subjective, criterion is according to a user-specified
expression of relevance of information objects, stated by assigning
relevance weights to attribute types that occur in the metadata
repository.
This paper demonstrates application of the presented relevance
criterion in the Topia architecture, in order to generate sequenced and
emphasised clusters of objects in presentations of artefacts from the
Rijksmuseum Amsterdam collection. RDF encoded annotations allow
derivation of the actual set of attributes that occur with the
retrieved objects at time of retrieval. Finally, we show that the user
statement of relevance is a basis for generating presentations that put
the retrieval result in specific perspectives.
ACKNOWLEDGMENTS
Funding for work on this paper came
from the Topia project of the
Telematica Instituut and CWI. Lynda Hardman and Frank Nack of CWI
provided many helpful comments for improvement. Stanislav Pokraev of
Telematica Instituut helped clarify the discussion of XML and RDF
technology for this work. We thank the Rijksmuseum Amsterdam for their
permission to use their Websites database and media content. We also
thank IBM for sponsoring the project.
REFERENCES
- Alani, H., Kim, S., Millard, D.E., Weal, M.J., Hall, W., Lewis,
P.H., and Shadbolt, N.R. Automatic Ontology-based Knowledge Extraction
from Web Documents, IEEE Intelligent Systems, 18(1) (January-February
2003), 14-21.
- André, E., The Generation of Multimedia Documents, in.
Dale, R, Moisl, H. and Somers, H. (eds.), A Handbook of Natural
Language Processing: Techniques and Applications for the Processing of
Language as Text, Marcel Dekker Inc., 2000, 305-327.
- Bal, M. Narratology: introduction to the theory of narrative,
second edition. University of Toronto Press, 1997.
- Bateman, J., Kamps, T., Kleinz, J. and Reichenberger., K. Towards
constructive text, diagram and layout generation for information
presentation, Computational Linguistics 27(3), 2001, 409-449.
- Berkhin, P. Survey of clustering data mining techniques,
http://www.accrue.com/products/rp_cluster_review.pdf
- De Bra, P. Pros and Cons of Adaptive Hypermedia in Web-based
Education. Journal on CyberPsychology and Behavior, Vol. 3, No. 1, Mary
Ann Lievert Inc., 2000, 71-77.
- Decker, S., Melnik, S., van Harmelen, F., Fensel, D., Klein, M.,
Broekstra, J., Erdmann, M. and Horrocks, I. The Semantic Web: The roles
of XML and RDF, IEEE Internet Computing, 15(3), 2000, 63-74.
- Buckingham Shum, S., Uren, V., Li, G., Domingue, J. and Motta, E.
Visualizing Internetworked Argumentation, In: Kirschner, P.A.,
Buckingham Shum, S.J. and Carr, C.S. (eds), Visualizing Argumentation:
Software Tools for Collaborative and Educational Sense-Making,
Springer-Verlag: London, 2003, 185-204.
- Ganter, B., and Wille, R., Applied Lattice Theory: Formal Concept
Analysis. Preprints
http://wwwbib.mathematik.tudarmstadt.de/Math-Net/Preprints/Listen/pp97.html,
1997.
- Geurts, J., Bocconi, S., van Ossenbruggen, J., and Hardman, L.
Towards Ontology-driven Discourse: From Semantic Graphs to Multimedia
Presentations, technical report INS-R0305,
http://ftp.cwi.nl/CWIreports/INS/INS-R0305.pdf, 2003.
- Hearst, M., Elliott, A., English, J., Sinha, R., Swearingen, K.,
Yee, K. Finding the flow in web site search. Communications of the ACM,
Vol. 45, No. 9, 2002, 42-49.
- Kamps, T., Diagram Design : A Constructive Theory, Springer
Verlag, 1999.
- Lassila, O. and Swick, R.R. (eds), Resource Description Framework
(RDF) Model and Syntax Specification. World Wide Web Consortium (W3C)
Recommendation, February 22nd, 1999.
- Little, S., Geurts, J. and Hunter, J., Dynamic Generation of
Intelligent Multimedia Presentations through Semantic Inferencing. In:
Proceedings of the Sixth European Conference on Research and Advanced
Technology for Digital Libraries (ECDL 2002), Springer, September 2002,
158-189.
- Mann, W., Mattheissen, C., and Thompson, S. Rhetorical Structure
Theory and Text Analysis. Information Sciences Institute Research
Report, ISI/RR-89-242, 1989.
- Van Ossenbruggen, J. and Hardman, L. Smart Style on the Semantic
Web. In: Semantic Web Workshop, WWW2002,
http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-55/ossenbruggen.pdf,
2002.
- Rijksmuseum Amsterdam, Rijksmuseum Amsterdam Website.
http://www.rijksmuseum.nl
- Rutledge, L., Alberink, M., Brussee, R., Pokraev, S., Van Dieten,
W. and Veenstra, M. Finding the Story Broader applicability of
Semantics and Discourse for Hypermedia Generation. ACM Hypertext, 2003.
(to appear)
- Rutledge, L., Davis, J., Van Ossenbruggen, J. and Hardman, L.
Inter-dimensional Hypermedia Communicative Devices for Rhetorical
Structure. In: Proceedings of the International Conference on
Multimedia Modeling 2000 (MMM00), Nagano, Japan, November 13-15, 2000,
World Scientific, 89-105.
- Rutledge, L., van Ossenbruggen, J., Hardman, L. and Bulterman, D.
Structural Distinctions Between Hypermedia Storage and Presentations.
In: Proceedings of ACM Multimedia (pages 145-150), ACM Press, 1998.
- Wong, S., Raghavan, V. Vector Space Model of Information
Retrieval: A Reevaluation. In: Rijsbergen, C.J. van (Hrsg.), Research
and Development in Information Retrieval, Cambridge University Press,
Cambridge, UK, 1984, 167-186.