Conferentie Werkgemeenschap Informatiewetenschap "Let your browser do the walking", donderdag 27 november 1997.

Filter Browsing on the World Wide Web

J. Vendrig, M. Worring, A.W.M. Smeulders

{vendrig, worring, smeulders}@wins.uva.nl, Intelligent Sensory Information Systems, Faculty of WINS, University of Amsterdam, Kruislaan 403, 1098 SJ Amsterdam.

Index

Abstract

Introduction

Query by Pictorial Example

Filter Image Browsing

Conclusions and future research

References

Authors

Abstract

Navigation through large multimedia information spaces depends on automated indexing of the multimedia data and dynamic interpretation of the user’s information needs. In full-text retrieval and especially in image retrieval, research has focused on similarity ranking techniques to dynamically respond to user queries. However, similarity ranking can lead the user to a homogeneous collection of undesired data. In this article we describe the Filter Browsing approach that is based on differences, rather than similarities, between information entities. Users can navigate through large databases by zooming in on parts of the database that suit their information needs. By iteratively limiting the information space, the user is able to end up with a small amount of relevant information items that can be inspected one by one. The application of Filter Browsing in the ImageRETRO system is described. For evaluation of ImageRETRO a pictorial database was filled with 10,000 images acquired automatically from the World Wide Web.

Introduction

In the past decades large archives of digital textual documents have been created. Due to a decrease in cost of acquiring and storing pictorial data, image- and video databases gain popularity as well. Finding information entities in such databases requires additional retrieval techniques. Indexing of data by adding descriptions to the items is intuitively the best way to assist users in their search for information. However, the generation of descriptions has some disadvantages. The major disadvantage is the feasibility in practice. Manual indexing would be very labor intensive and thus costly in large data collections. A more fundamental disadvantage is the ambiguity of data. Humans will assign descriptions to information items based on their own point of view. Descriptions generated by different persons (or even by one person at different times) are likely to vary unless an accurately specified vocabulary is available for the particular domain. With regard to images, in [6] it is concluded that "keywords often provide a better description of the person who assigns them than they do the image".

Considering the disadvantages of manual indexing it is obvious that the assignment of keywords and descriptions should be automated. For full text documents elaborate indexing techniques are available, reducing large chunks of text to a short list of relevant words. Extracting textual descriptions from images is however less successful. The translation from an image object to keywords requires a great deal of knowledge about the domain. Furthermore, unlike words in a text, objects in an image can’t be distinguished easily.

The problem of descriptions generation holds as well for Query by Association, the information retrieval method in which user input is a choice of one of the navigation controls, e.g. multimedia hypertext. Since the controls have to be based on the image content, the user-friendly approach of Query by Association faces the same disadvantages as keyword queries.

Context can be useful to reduce a multimedia retrieval problem to a full text retrieval problem. In [10] the Piction system is described that identifies human faces in newspaper photographs using visual semantics and natural language processing. Piction associates terms in the caption of an image with a photograph, or even a part of the photograph. The indexing is performed fully automatically. The retrieval of the images is based purely on descriptions.

Since context is not always available and indexing based on information encoded in the image itself would be less speculative, multimedia research focuses on indexing on image features. Image features are non-information-bearing [3] metadata derived directly from the pictorial data. Query by Pictorial Example is currently the most frequently used method to retrieve images, e.g. in [4], [5], [7].

In the next section the Query by Pictorial Example method will be explained in detail. Next, it is shown how disadvantages of the method can be avoided when combined with concepts from Query by Association. The resulting method is called Filter Image Browsing. Results of an experiment with a Filter Image Browsing implementation will be described, followed by conclusions and suggestions for general application of Filter Browsing in multimedia information retrieval.

Query by Pictorial Example

In 1980 [1] introduced the concept of Query by Pictorial Example (QPE) which was based on the Query By Example (QBE) concept for alphanumeric data. Query by Pictorial Example is the retrieval method in which the query for images itself consists of one or more images. The result of the query is a similarity ranking of a set of images according to the similarity in value for computed image features.
The most important difference between QPE and the original QBE concept is the match operation. The latter method results in a set of records that exactly match the query.

Although the query formulation and the features for computing the similarity differ considerably, Query by Pictorial Example uses the same principle as full-text information retrieval to satisfy a user's information need. In Query by Pictorial Example, the system's purpose has changed from giving deterministic answers to minimizing the user's search and evaluation efforts.
In QPE a user can for example provide the image retrieval system with a picture of a red car, asking for all pictures that appear similar. The system computes the values for the a priori defined features of the example image and compares them to the values of the images in the database. The user is then confronted with the images that are most likely to fulfill her information need according to the choice of features and similarity ranking method used.

Apart from the selected image features and similarity ranking techniques, in QPE the example image given by users is of great importance. Based on the way the example image is fed to a system, we distinguish three types of Query by Pictorial Example:

Query by external Pictorial Example: the example image consists of sensory data sampled from the real world;
Query by internal Pictorial Example: the user selects an example image that is already present in the database;
Query by Construction (also known as Query Canvas [6] or Query by Sketch): the user creates a new synthetic image for input. The constructed image can contain parts of real images.

For systems that desire quick user interaction Query by internal Pictorial Example is most suitable. Query by external Pictorial Example requires the user to have available images beforehand. Since Query by Construction requires a great deal of artistic effort from the user [8], this input type is not very suitable for general user interaction. In Query by internal Pictorial Example the user can simply tell the system which images are most similar to the images she aims to find.

Although Query by internal Pictorial Example allows simple user interaction, the method has two major disadvantages. First, the user has to go through the time consuming process of finding a suitable example image in the database. Secondly, although the Query by internal Pictorial Example method can be applied iteratively, the use of similarity ranking can cause false convergence. False convergence occurs when the top ranked images are around a local optimum of the similarity, viz. the wrong one. A falsely converged set of images does not yet comply with the user’s information need, but it is no longer useful for a new query, since the result of such a query would be identical.

Filter Image Browsing

In the ImageRETRO system (Image RETrieval by Reduction and Overview) [12], Filter Image Browsing is introduced as a new retrieval method in which a user iteratively selects dynamically generated clusters of images. Filter Image Browsing is based on Query by internal Pictorial Example, but uses the overview and relation concepts of Query by Association to overcome the disadvantages of the former method. The scather/gather method [2] uses a comparable approach for the retrieval of textual documents.

Filter Image Browsing can be seen as an overview overlay on Query by internal Pictorial Example, or as the addition of a dynamic zoom function for image databases to Query by Association. The Filter Image Browsing system first presents an overview of images that are considered representative for parts of the image collection. The user selects an example image from the overview after which a similarity ranking is performed. A large percentage (e.g. 25%) of the top ranked images form the new subset of images, which we call the active image set in that state of the retrieval session.

Figure 1. Filter Image Browsing retrieval process.

The cycle of overview, image choice and filtering is repeated until the user decides to end the retrieval session. The goal of the repeated filter operations is to reduce a large heterogeneous set of images to a small homogeneous set. The user inspects all images in the small set one by one.

Filter Image Browsing uses similarity ranking, while false convergence is limited by applying the overview techniques to the entire subset of images instead of just the top ranked images. In the following paragraphs, specifics of Filter Image Browsing will be described in more detail. Furthermore, some side effects of Filter Image Browsing will be discussed.

Filter Image Browsing is based on two functions, viz. overview and reduction. The overview function provides the user with a small yet representative set of images, so that she can choose a good example image. The reduction function makes a distinction between images that are likely to be desired and images that are not, based on the information (the selected example image) available.

In ImageRETRO, the overview function is implemented by dynamically defining clusters of images, based on their feature values. Clusters are allowed to overlap. For each cluster an image is chosen as a representative. The total set of representative images is shown to the user.
The use of reduction and similarity ranking does influence the overview function, since all images in the active set must be accessible by choosing one of the images presented to the user. The dynamically defined clusters must cover the entire set of images in the active image set. Therefore, we introduce the cover constraint stating that every image in the active set must be part of a cluster.

In ImageRETRO a brute force algorithm was constructed to enforce the cover constraint. The algorithm performs an imaginary reduction operation on the active set given a seed image. The cluster is the result of the reduction operation. This is done repeatedly (each time starting from the entire active set) until each image in the active set is part of one of the clusters. The collection of seed images forms the overview set. To speed up the algorithm, the seed images are chosen such that clusters are likely to differ considerably from previous clusters.
For example, anticipating a reduction to 25% of the active image set, the algorithm needs 12 clusters on average to guarantee the cover. In a best case situation, only 4 clusters would be necessary. In our opinion the number of images shown to the user in the overview should be limited from 20 to 30. Since the brute force algorithm does not exceed this limit and it can be executed fast enough to allow interactive retrieval, it is useful to guarantee at least one search path to every image in the database.

We note two side effects of Filter Image Browsing that are caused by the variation in active image sets during a retrieval session.
First, the use of smaller sets of images during the session results in an increase of processing speed. IBM’s QBIC already applies filtering techniques for the purpose of speeding up computations solely [4]. Thus the system can use computationally expensive features once the size of a reduced image set allows it.
Secondly, the system can use features that are appropriate for the active image set. Features that are valuable to make a general distinction between image clusters in the beginning of the retrieval session may loose their value after the distinction has been made. E.g., the number of distinct colors is a discriminating feature in a large set of images, but it is no longer useful when all images in the reduced set have the same number of colors.
In addition, some features can be applied only when conditions are met. E.g. color features may be useful only when there are no gray valued images present in the image set.
Furthermore, the use of statistics determines the discriminatory power of each feature in a state of the retrieval process, so that the importance of each feature is related to the active image set.

Evaluation

In order to evaluate the ImageRETRO system an experiment was set up. In the following sections we describe the image domain, evaluation criteria and the experimental environment.

Domain

As domain for the image database the World Wide Web was chosen, since it is a large and diverse domain and the acquisition of images was practically feasible. A robot gathered 10,000 images by following hyperlinks. The robot could easily be used to extend the number of images in the database. For every image index values were computed for 9 simple image features, primarily based on colors. The ImageRETRO system and the images are available via the World Wide Web.
Other retrieval systems that indexed images from the World Wide Web based on pictorial content include PicToSeek [5] and WebSeek [9].

Criteria

Two types of evaluation criteria determine whether Filter Image Browsing is effective or not. One type evaluates the performance of the system stand-alone, while the other type compares different image retrieval systems.

The stand-alone evaluation consists of two criteria. The first criterion measures the loss of relevant images after the reductions. It tests the decrease in precision relative to the reduction of the image sets.
Let I_s be the active image set in state s. The precision P of image collection I_s is the number of desired images that is present in I_s as percentage of the total number of desired images:

The average decrease in precision p for S reductions is:

When N_s is the size of the image set in state s, the average reduction r for S reductions is:

The reduction operation is considered useful, when p is significantly lower than r.

The second stand-alone criterion measures the performance of the overview functionality by comparing it to an overview function that picks images at random. The overview function is considered useful, when p/r is significantly lower in the case of the overview function than in the case of randomly picked images.

The relative evaluation compares the ImageRETRO system to a straightforward implementation of the Query by internal Pictorial Example method. For ImageRETRO to be successful, the number of images found after a reasonable amount of user interactions must be higher than the same figures for Query by internal Pictorial Example. The success of a system at a stage in the retrieval session is expressed by the recall R:

Simulation

During an experiment the behavior of the user should not be random. The purpose of the evaluation is to examine the effectiveness of the retrieval method. Thus the serendipity of users should not be subject to evaluation. Furthermore, the knowledge of the user should not change during the experiment. I.e., the results of a session of finding certain images with one method should not influence the behavior in a session of finding the same images with an other method.
Ideally, users in the experiment search for images so that they are:

consistent in their behavior: the user makes the same choices as before when confronted with a situation equal to an earlier situation. This also indicates that the information need of the user is consistent during the experiment for the various retrieval methods.
unbiased: the user reviews situations in a retrieval session independent of earlier experiences with the image collection and the information need.

Since it is hard to find such human users, we chose to use a computer simulation to evaluate retrieval performance. The behavior of a simulated user is subject to predefined rules and ideally consistent and unbiased. The choices of the simulated user include reductions and undo-operations (back navigation) based on the resulting image sets.
For the use of a simulation it is assumed that the information need is defined in advance. Thus the assistance of the retrieval system in the definition of an information need cannot be measured.
The predefined information need is called a target set. Seven target sets were selected from the collection of 10,000 images. The sets complied with three conditions:

Same style. The images in the target set were related to one another by their style to define the similarity of the target set. Compliance with this condition was evaluated by using images from one original site per target set only and by human inspection.
Small distance in feature space. It was possible to cluster the images according to their feature values. The images were at most 100-nearest-neighbours of one another. This made the target set less sensitive to the quality of the features.
Size. The target set contained at least 5 images. This allowed the experiment to measure a variety of recall and precision values.

The target sets used in the experiment are available at http://carol.wins.uva.nl/~vendrig/wgiw/target.html.

Results

The stand-alone criterion that is concerned with the loss of relevant images after the reductions shows that the loss is limited to about 10% of the total reduction. For example, in the case where 70% of the active image set is discarded (reduction to 30%), only 7% of the desired images were lost. The results for various reduction percentages are shown in the second column of table 1.
The second stand-alone criterion compares the cluster based overview function against a case in which random images are shown to the user. The results are shown in the second, respectively third column of table 1.

Reduction	Cluster Based Overview	Random Overview
25%	0.09	0.17
30%	0.11	0.19
35%	0.12	0.21

Table 1. Evaluation of the overview function for various reduction factors. The numbers denote the average fraction of relevant images in the active image set lost after a reduction.

Finally, the comparison of Filter Image Browsing to a Query by internal Pictorial Example system is shown in figure 2. For each of both systems both a cluster based and a random overview were simulated. The graph shows the number of user interactions on the X axis and the cumulative recall on the Y axis. Note that cumulative recall is the number of desired images that were shown on screen to the user and that it does not concern the active set entirely. The Filter Image Browsing results are based on reductions to 35% of the active image set.

Figure 2. Comparison of Filter Image Browsing and Query by internal Pictorial Example (QiPE).

The breakpoint after 5 interactions is caused by convergence of the active image set. After 5 iteratively applied reductions to 35% of the active set, the active image set has decreased in size from 10,000 to about 50 images. Hence, the chance that relevant images in the active set are actually shown to the user is high.

Conclusions and future research

The simulations show satisfying results for all determined criteria. Although Filter Image Browsing does cause the loss of desired images, far more irrelevant than relevant images are discarded.
Table 1 shows that far less relevant images are lost when a cluster based overview function is used, compared to an overview with randomly picked images.
Query by internal Pictorial Example performs the best of the two systems after only a few actions. But after 6 interactions, which can be considered very reasonable, Filter Image Browsing performs significantly better than Query by internal Pictorial Example. The initial investment in interactive query specification is paid off when the set of images converges to a small set.
Thus, the combination of Query by internal Pictorial Example from the image retrieval field and Query by Association from hypertext research in Filter Image Browsing results in a powerful method for browsing through image databases.

In future research, we will apply a similar approach to full-text databases, so that users can find documents by iteratively choosing automatically indexed keywords from the database. In cooperation with Origin/Medialab an application using such techniques is currently under development. Later, combination of image and full-text Filter Browsing will lead to a uniform approach for multimedia information retrieval.
In addition, we plan to extend Filter Browsing systems with innovative interfaces, such as Medialab’s Aquabrowser [11]. The combination of user-friendly interfaces and powerful adaptive retrieval systems has to result in intuitive access to multimedia databases.

References

[1] N.S. Chang and K.S. Fu. Query-by-pictorial example. IEEE Transactions on Software Engineering, SE-6(6):519-524, 1980.

[2] D.R. Cutting, D.R. Karger, J.O. Pedersen and J.W. Tukey. Scather/gather: A cluster-based approach to to browsing large document collections. In: Proceedings of SIGIR'92, 1992.

[3] W.I. Grosky. Multimedia Information Systems. IEEE Multimedia, 1(1):12-24, 1994.

[4] M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele and P. Yanker. Query by Image and Video Content: The QBIC System. IEEE Computer, 28(9):23-32, 1995. (demo)

[5] T. Gevers and A.W.M. Smeulders. PicToSeek: A Content-based Image Search Engine for the World Wide Web. In Proceedings of VISUAL'97, San Diego, USA, 1997. (demo)

[6] R. Jain. Infoscopes: multimedia information systems. In Multimedia Systems and Techniques, B. Furht, ed., Kluwer Academic Publishers, Boston, 1996, pp. 217-253.

[7] A. Pentland, R.W. Picard, S. Sclaroff. Photobook: content-based manipulation of image databases. In Borko Furht, editor, Multimedia Tools and Applications, Kluwer Academic Publishers, Boston, 1996, pp. 43-80.

[8] S.W. Smoliar and H.J. Zhang. Video Indexing and Retrieval. In Multimedia Systems and Techniques, B. Furht, ed., Kluwer Academic Publishers, Boston, 1996, pp. 293-322.

[9] J. R. Smith and S.-F. Chang. Visually Searching the Web for Content. IEEE Multimedia, 4(3):12-20, 1997. (demo)

[10] R.K. Srihari. Automatic Indexing and Content-Based Retrieval of Captioned Images. IEEE Computer, 28(9):49-56, 1995.

[11] W.A. Veling. The Aqua Browser - visualisation of dynamic concept spaces. Digital Intelligence (AGSI 97), The Hague, The Netherlands, accepted for publication. (demo)

[12] J. Vendrig. Filter Image Browsing: a study to image retrieval in large pictorial databases. Msc Thesis, University of Amsterdam, The Netherlands, February 1997. (demo)

Authors

More information about the authors and their current research interests can be found at their homepages.