Task-Based Information Filtering:
Providing Information that is Right for the Job

Paul De Bra, Geert-Jan Houben, Frank Dignum
Department of Computing Science
Eindhoven University of Technology
{debra,houben,dignum}@win.tue.nl

Abstract: Many attempts have been made to provide Internet and Intranet users with tools that aid them in finding valuable information in the many gigabytes of data they have access to. And although large search engines like Alta Vista and Excite sometimes find the appropriate documents, based on just a few well-chosen keywords, most of their answers are not relevant for the user.
Many company Web-servers are beginning to offer search engines right on the first page, to guide visitors to the information they are looking for. Although the overload of irrelevant information from these services is less than with the global search engines, it is still difficult to find the information a user wants.
The core of the problem with these search engines is their one size fits all approach to information retrieval. We propose a different strategy: by using an agent architecture that distinguishes three types of agents (process agents, document warehouse agents and retrieval agents) we can take into account the role of the user in her organization, or the task for which she needs the information. In order to evaluate which documents are relevant for which tasks we propose that cooperative retrieval agents learn to select appropriate documents based on user-feedback.

1. Introduction

Every World Wide Web user has experienced the problem of finding relevant information. Neither subject-based menu systems like Yahoo, nor large search engines like Alta Vista and Excite provide a way to quickly find the documents a user is looking for. Even when one locates a valuable site it is often still difficult to find the appropriate documents on that site. Many Web-servers try to overcome this problem by providing their own miniature version of the large search engines. The information overload on a single site is less dramatic of course, but finding the right documents can still be a problem even on a single site.

The core of this problem is that all available search tools select documents based on the textual content of the document, and not on the purpose or task the document is written for. When one connects to a typical Web-server, information is usually presented based on the hierarchical structure of the company or organization. For most visitors this structure is irrelevant. A presentation based on who the users are or what the purpose of their visit is would greatly help most users. But there is still a danger that none of the offered choices matches the reason why a user contacts the site.

We lack a good mechanism to manage an organization's information in such a way that users have easy and efficient access to the information that is relevant for their tasks. This information (management) system should support three aspects of usage:

helping the users in their access to information: finding the information
helping the user community to manage and maintain the information: organizing the information warehouse
helping the user community to replenish the information: updating or adding new information

In this paper we concentrate on the first of these aspects: supporting the users in finding information. It is essential to acknowledge the relationship between

the place of an activity within a business work process, and
the need for information during the execution of the activity.

In [HD97] we have described how agents can be used to support the work processes and their activities. These agents contain knowledge about the goals of the process and the standard procedure to fulfill that goal. They also contain knowledge about which information is needed for each step in this standard procedure. Besides the knowledge about the standard procedure they contain a planning module that can be used to construct a plan to reach the goal of the process in those cases when the standard procedure cannot be followed. Here we combine these agents with agents that support the users in finding and receiving information. Thus we construct an information system that supports the enterprise-wide exchange of information.

Specifically, we propose the following cooperation between the process agents and the retrieval agents. When the process agent needs information to support the next step in a business-process it will not only send this request to the retrieval agent, but will also provide information about the context of this request. I.e. it will indicate the goal of the process and the role of the information in the activity to reach that goal. In this way the retrieval agents can build up a user-profile not only based on the word-usage of retrieved documents, but also based on the context in which the documents are used by the user. Thus our agents learn why certain documents are considered relevant by the user.

2. Task-Based Information Retrieval

In an environment like World Wide Web, but also in enterprise-wide information systems (e.g. Intranet solutions) in any medium to large sized organization, information is available on a wide variety of topics. The information comes from many different sources and is used by very different kinds of people. Both the menu-based systems like Yahoo and the huge search engines like Alta Vista and Excite are purely subject oriented. They try to meet the challenge of providing pointers to valuable documents, based on a search pattern which often consists of just a few keywords.

Many approaches exist to improve on this kind of search technique, by using information from more than just a single user query. Golovchinsky [G97a,G97b] assigns weights to search terms based on how many queries ago the search term was used. Queries in his system are actually hypertext links, not user-typed sets of keywords. Fishnet [BL97] is a tool, developed at the Eindhoven University of Technology, that is typical for agent-based retrieval tools that maintain a database of representations of previously returned accepted and rejected documents, in order to form a user model that represents the typical interest of the user. All these types of tools classify documents based on content.

We argue that the above mentioned tools cannot provide satisfactory query results because whether a document is relevant or not cannot be easily determined based on a document's content. When a user asks for ``automobile repair'' a search engine will return documents with hobby repair instructions for various engine problems, detailed instructions for experienced car mechanics, help information on auto-body work, addresses of repairmen and shops, etc. Whether documents are relevant to the user depends on much more than just the subject of the document:

Who is the user, what is her job, her training, her skills?
Which task is the user trying to accomplish when asking the query?
Where is the user located (and/or where is she trying to go)?
At which company (or organization) is the user?

All these aspects are related to the specific role the user is playing within the work process. A factory organization supplies its shop floor workers with the proper material (parts, tools, etc.) based on the position of the workers in the production process. (E.g. the carpenter and the designer get different pencils.) In the same way an administrative organization must also supply its workers with the proper material (information, documents, etc.) that is suited for their role in the administrative process.

In order to realize this we propose retrieval agents that use:

knowledge about (the state of) the process
feedback from the users about the relevance of the supplied documents

For the process knowledge the retrieval agents should communicate with the process supporting agents (see [HD97]). These process agents use an approach like Action Workflow [MWFF] to establish knowledge about the state of the process. The retrieval agents should learn from the process agents about the state of the process, and therefore about the (business) purpose of supplying the information.

The retrieval agents should ask for user-feedback in order to learn what characterizes relevant documents and irrelevant ones. The difference with others is that in our proposal the user-feedback is not limited to a boolean "relevant/not relevant" selection, but it includes feedback on:

topic: does this document deal with the requested subject?
job/level: is this document appropriate for the user's job and is the material at the right (skill) level?
task: is this document helpful for the user's task?
location: is this document useful for users in this (geographic) area?
organization: is this document useful for users in this company/organization?

Some of this feedback can be given automatically by the process agent, while other aspects should be asked from the user or learned through experience. By discriminating documents according to these different criteria, a search agent is not only able to better find information that is actually helpful for the user, in her job situation, but agents can be tied together to form a more detailed classification of information, as is described in the next section.

Moreover, this structured approach gives a better tool to organize the information and document management process. Any standard factory organization invests in setting up the right support mechanism to facilitate the supply of material to its workers; the average administrative organization on the other hand does not properly acknowledge the different activities that should be involved in supplying the workers with the right documents. For example, document warehouse management is not something that can be completely left to automated agents (just as there are only exceptional cases in which fully automated hardware warehouses are feasible). The use of retrieval agents that cooperate with the other agents, which are involved in the document management, gives in our opinion a solid base for an effective and efficient enterprise-wide information system.

3. A Cooperative Agent Architecture for Feedback

In order to find relevant information quickly it helps when documents contain meta information indicating their subject, intended audience and possibly other aspects. Unfortunately it is not possible to add meta information to external documents. So, we cannot assume that it is feasible to design and implement for every document an agent with knowledge about the document and its usage. What is feasible is that a user's retrieval agent is cooperating with a number of document warehouse agents that act as a kind of information brokers that know about the market place where documents are used (retrieved).

In the architecture that we propose here the learning retrieval agents add part of the document knowledge (meta-information) to their internal database (or user model). The internal database of an agent serves three purposes:

The agent contains knowledge about the state of the process in which the user is involved. This knowledge is obtained through cooperating with the process agents.
When a document is encountered again in a search the agent already knows whether the user finds it relevant or not under the given circumstances.
When new documents have to be evaluated it helps to have a database with classifications of similar documents.

The best possible use of a retrieval agent's database is the help it can provide to other agents. When an agent encounters a document for the first time, other agents may already have classified that document. Although these other agents work for users with different interests, different jobs and tasks an agent can use the judgement of other agents in better evaluating a newly found document. We feel that summaries of this knowledge should be stored (learned) in special agents dedicated to the management of the document warehouse. These document warehouse agents can offer the common knowledge about the documents and their usage, and they can (on the basis of this knowledge) proactively control the contents of the document warehouse.

Cooperating agents are only feasible within a single organization, and most likely also only at a single site (or geographically near sites). This implies that agents do not have to make transformations between location and organizational information. (If one user has taught her agent that a document is relevant to her location, this applies to the other users' agents as well.)

Apart from reusing evaluations of documents from other retrieval agents, a retrieval agent may also ask other retrieval agents for documents about a certain topic and for a specific job and task. Only agents working for users with similar interests may offer help. Rather than simply reproducing these documents the agents also need to compare task information. When users have different jobs requiring different information, the agent needs the documents that were rejected by the other agent.

Altogether the architecture involves three types of cooperating agents:

process agents: responsible for the flow of activities within the business processes, and thus responsible for the operational decisions involved in the execution of tasks
document warehouse agents: responsible for the control of the document management process
retrieval agents: responsible for the match between activities and their purpose on the one hand, and documents from the document warehouse on the other hand

4. Conclusions and Future Work

Information retrieval can be improved by separating knowledge about document content from the tasks a document is intended to support and the geographic location or organization it is aimed at.

By distinguishing three types of agents, process agents, document warehouse agents and retrieval agents, an organization can set up a retreival process in which the necessary knowledge is adequately distributed. When these agents cooperate, just like people cooperate in the traditional factory inventory processes, retrieval can be much better supported in flexible medium or large sized business environments.

Retrieval agents that assist users in finding information can help each other both by providing their evaluation of specific documents and by proposing documents based on the knowledge of each other's user model.

Evaluation of documents can be further enhanced by including a document's environment into that evaluation. Other documents pointing to a document, as well as pointers from the document under evaluation may provide valuable information about the purpose of a document. Also, the navigation path taken by a user to reach the document may provide cues about the type of information the user is searching for. These additional cues are not yet incorporated in our cooperating agents architecture, but will be in the near future.

5. References

[G97a] G. Golovchinsky.: Roll-Your-Own Hypertext. In Proceedings of the Flexible Hypertext Workshop, Macquarie Computing Reports C/TR97-06, pp. 49-53, 1997.
[G97b] G. Golovchinsky.: What the Query Told the Link: The integration of hypertext and information retrieval, In Proceedings of the ACM Conference on Hypertext, pp. 67-74, 1997.
[BL97] P. De Bra and W. Lemmens.: FishNet: Finding and Maintaining Information on the Net. (To appear) In Proceedings of the AACE WebNet Conference, Toronto, 1997.
[HD97] G.J. Houben and F. Dignum.: Information for organized work. In F. Baader, M. Jeusfeld, W. Nutt (eds), Proceedings 4th Int. workshop Knowledge Representation meets databases, Athens, 1997.
[MWFF]: Raúl Medina-Mora, Terry Winograd, Rodrigo Flores, Fernando Flores, The Action Workflow Approach to Workflow Management Technology, In Computer-Supported Cooperative Work 92 Proceedings, 1992, pg. 281-288.

Task-Based Information Filtering: Providing Information that is Right for the Job