Proceedings of the 2nd Workshop on Adaptive Systems and User Modeling on the WWW

Exploiting user models for personalizing news presentations

Liliana Ardissono, Luca Console, Ilaria Torre 
Dipartimento di Informatica - Universita' di Torino 
Corso Svizzera 185 - 10149 Torino (Italy) 
Email: liliana@di.unito.it, lconsole@di.unito.it, ilaria@babel.it
Fax: +39-011-751603; Phone: +39-011-6706711
Abstract: This paper presents a framework for the generation of adaptive hypertexts for accessing on-line news servers. News servers contain huge amounts of information, concerning different subjects. The aim of our system is to present the most appropriate set of news (and advertisements) to each user, choosing the "right" detail level for each news item. This is obtained by using knowledge representation, user modeling and flexible hypermedia techniques.

1. Introduction

Personalizing the access to news is a challenging application for research on user modeling and adaptivity. In fact, the number of Web sites containing news is rapidly growing (e.g. electronic newspapers, news servers, press agencies), and many other sites (e.g., search engines) are going in the direction of becoming news providers. The growth of these sites is an answer to the needs of companies (and individuals) for which the availability of up-to-date information is of paramount importance. However, this growth may turn into a serious problem: when the sites contain huge amounts of data the search for relevant information becomes a difficult task. The creation of sites focusing on specific subjects is only a partial answer. On the other hand, customization, i.e., the possibility of presenting the most appropriate news to each user, is a more interesting option. This would allow the site to provide the best service to each customer, yet in a single repository which is adequate for all the possible interests. Customization can also provide economic advantages for news Web sites whose incomes mainly rely on advertising. Showing the "right" banners to each user would make advertising more effective, attracting the interest of companies.
The aim of this paper is to show how user modeling and adaptive hypermedia techniques can be applied to such a task. These techniques (see [11,16]) have been widely exploited to design adaptive user interfaces in several areas, such as ITS [5,7], the generation of electronic catalogues [6,8,12], information filtering and recommender systems [1,3,10,13,14,15]. While many systems in the first two areas use well-structured databases to provide the user with personalized information, most information filtering systems rely on large, heterogeneous data sources and use robust and shallow techniques to filter out the information to be provided. Although news applications typically fall in this category (e.g., see [9]), we believe that in their design more attention should be paid to the organization of the database of news. In our work we show how the introduction of a (shallow) structure on this database can provide significant advantages for exploiting user models and for personalizing the access to the server and the presentation of news. In particular, not only does the approach allow us to present the "right" set of news to each user but also to tailor the detail level for the presentation of each news item. Moreover, also the advertisements added to the pages can be tailored to the user's interests. In the paper we show how these different forms of personalization can be achieved by decomposing the user models along multiple dimensions and by exploiting techniques for the dynamic refinement (learning) of the models themselves (since a user may visit the server several times).

2. The system at a glance

The news server we designed is formed by four main modules:
  1. Databases of news and advertisements.
  2. A user modeling component, which exploits stereotypical information about users to build the initial model of a new user; then, it updates and revises the model by taking into account the user's behavior during her/his visits to the site. Finally, it stores the user model into a users database (if the user gives her/his consensus).
  3. A knowledge base that relates the features in the user model to what has to be presented (which news, at which detail level and which advertisements).
  4. A module for the dynamic generation of the Web pages to be presented (in an hypertextual format).
When a new user connects to the server, (s)he is asked to fill in a form asking a few initial data (most questions allow the user to select a value in a list of pre-defined linguistic values). The user is classified using the stereotypes, and the predictions that are generated constitute the initial user model. If the user has previously registered at the news server, her/his model is retrieved from the users database. Given the user model, the system selects the appropriate sections/news, the detail level for the presentation of the news of each section and the advertisements, and dynamically generates the pages. The user's choices during the navigation are recorded; these data may activate the rules for the dynamic revision of the user model and this may in turn change the news and advertisements that will be presented subsequently.

3. A Structured database for news and advertisements

As we noticed in the introduction, we imposed a (shallow) structure on the news database.
First of all, the news are hierarchically organized into a taxonomy of sections, which include titles such as politics (with subsections such as internal and foreign politics), sport, economics, technology, culture, entertainment (with subsections such as theater or cinema), etc..
Second, we introduced the concept of "news" as the main structured entity in our database. In our view, news are complex entities with a set of associated attributes that define the (possible) components of news: a title and subtitle; author(s)/sources; an abstract; a text (article); a set of graphics summarizing the content of the text; a set of photos and/or videos and/or audio clips; a set of commentaries, interviews, agency reports; a set of raw data and/or detailed (technical) charts/graphics, and so forth. Some of the attributes are optional and can be multi-valued (e.g., photos, video clips); moreover, the same object (again a photo or a clip) may be associated with more than one news item. Thus, each news item corresponds to a chunk of information (concerning, e.g., an event), to be conveyed to a reader.
Finally, the database is an historical one so we can store information concerning several days; in particular, In particular, the same news item can be present in the database on different days, possibly with different attributes.
A second database contains the commercial advertisements that can be inserted into the pages. For each advertisement, we keep track of its topic (in order to relate it to the sections of the news server) and target, i.e., the segment(s) of population to which it is directed (see the next section).
From the main points outlined above, it is clear that our approach is different from most approaches to information filtering which do not assume any structure on the repository of documents. Indeed, these approaches operate on repositories of text files; thus, they are simpler and require little efforts in the construction and maintenance of the repository. On the other hand, some efforts are needed in our approach, even though it is important to notice that the structure we defined is shallow and is not very different from the one imposed by the software systems used in the editorial offices of some newspapers. In fact, these systems require that the author of a paper submits her/his work to a specific section of the newspaper; moreover, if there are photos or extra items (e.g. interviews), the author must specify the paper to which they are related, so that this will be taken into account in the layout of the pages.
The main peculiarity of our approach is that it provides handles for defining sophisticated personalization strategies. In section 5 we shall discuss how the structure imposed to the database allows us to define different detail levels for presenting news. These strategies could not be defined if news were simply organized as repositories of unstructured documents.

4. User models for the news server

The model of each user is initialized by classifying her/him in stereotypical descriptions. The data used in the classification are those asked to the user in the initial form: age and gender; education level and specialization field (only in case of high education level); type and field of job; whether her/his access to the news server is for work or not; how frequently (s)he connects to the Web, her/his hobbies or priorities. For the last we considered a short list of activities (such as travelling, doing sport, going to the cinema, following sport, shopping, etc.) and for each one of them the user must select a linguistic value specifying how much (s)he likes it (a lot, some, ...).
One problem with user modeling in our application is that it must cope with features concerning different aspects of users. For example, the selection of the (sub)sections and news depends on the user's interests and capabilities; the detail level is related to her/his expertise and receptivity; finally, the selection of the advertisements must be related to her/his life style. Thus, the stereotypes must provide a first, coarse prediction all on such aspects, which can be seen as multiple viewpoints on the description of the user. Indeed, a combinatorial number of stereotypes would be needed if they classified the user and made predictions under all these viewpoints in a single step (e.g. the stereotypes should describe classes of readers and customers - the latter are used to select the advertisements).
In order to avoid these problems, we decomposed the problem into different dimensions, dealing with each of the above aspects in an independent way. We introduced four families of stereotypes, which use partially overlapping classificatory data and make predictions on the different user features. A user is classified independently in each family and the predictions are merged (the adoption of different groups of stereotypes has been borrowed from [2]).
We have defined the four following stereotype families, taking the background knowledge for the definition of the stereotypes from the Eurisko Sinottica reports which provide statistical information about the Italian population (habits, preferences, life styles, etc.) every year. The use of four families simplifies both the construction of the stereotypes (and some of the families - e.g. the "Life Styles" one, may be re-usable in other applications) and the classification process (i.e., the initialization of the user model).


Professional_Financial_Reader :  
  profile : 
     age: 20-25: 0.1; 26-35: 0.2; 36-45: 0.3; 46-65: 0.3;  65: 0.1 
     gender: M: 0.8; F: 0.2 
     job: manager: 0.45; free-lance: 0.2; entrepreneur: 0.2; ...; student: 0.02 
     job field: financial or banking or insurance: 0.5; commerce: 0.14; civil services: 0.15; ... 
     reason of connection: work: 0.8; personal: 0.2
     hobbies: going to the cinema or watching TV: a lot: 0.1; some: 0.4; a little: 0.4; not at all: 0.1
     hobbies: following sports: a lot: 0.3; some: 0.4; a little: 0.2; not at all: 0.1
     ... 
  predictions on interests : 
     economy: high: 1; medium: 0; low: 0; null: 0
     politics: high: 0.8; medium: 0.2; ...  
     sport: high: 0; medium: 0.1; low: 0.6; null: 0.3  
     culture: high: 0; medium: 0.1, low: 0.4; null: 0.5  
     technology: high: 0.1; medium: 0.3; low: 0.4; null: 0.2  
     ... 

Adult_Superior_Committed_Style : 
  profile : 
     age: 35-55: 0.6; 56-65, 0.3; >65: 0.1 
     education level: university: 0.8; secondary school: 0.2 
     education type: economic: 0.2; law or political or sociological: 0.35; humanistic: 0.25; ... 
     job: manager: 0.3; free-lance: 0.2; entrepreneur: 0.2; ...; student: 0.02 
     priorities: (s)he likes travelling: a lot: 0.7; some: 0.3; ... 
     priorities: (s)he likes house care: not at all: 0.2; a little: 0.4; some: 0.3; a lot: 0.1
     priorities: (s)he is socially/politically committed: a lot: 0,8; some: 0.2; ... 
     ...
Figure 1 - Examples of stereotypes.


Each stereotype has two groups of slots:

Profile. The profile of the users corresponding to the stereotype is described by a set of slots (user features). A probability is associated with each linguistic value of each feature: this is the conditional probability that the user belongs to the stereotype, given the linguistic value of the feature. For example, in the stereotype "Professional financial reader", the slot "age" specifies the probability that the user is a professional financial reader, given her/his age; e.g.:
        p(Professional_Financial_Reader | age in [20,25]) = 0.1
The probability p(stereotype) that a user belongs to the class corresponding to each stereotype can be computed using the initial data provided by her/him. In particular, we assume that the features are independent and thus p(stereotype) is the product of the probabilities obtained from the slots (by matching each slot with the user's data). The independence assumption is reasonable for at least two reasons: first, all the stereotypes in the same family contain the same set of profile slots; second, we are interested in the ranking of the stereotypes belonging to each family, rather than in the actual values of their probabilities. For each family, this ranking can be obtained after normalizing the probabilities of the stereotypes in the family.

Prediction. Slots that make predictions. The features in these slots are different in the various stereotype families (and are not present in the "Life styles" family). A probability is associated with each linguistic value of each feature: this is the conditional probability of the linguistic value for the feature, given that the user belongs to the stereotype.
The stereotype "Professional financial reader" belongs to the "Interests" family and thus its predictions concern the interest level in the various sections of the server. Thus, in the example we have:
        p(interest_in_economy = high | Professional_Financial_Reader) = 1
The probabilities predicted by a stereotype are computed as follows:
        p(featurei = valueij) = p(featurei = valueij | stereotype) * p(stereotype)
where: p(featurei = valueij | stereotype) is the value associated with valueij of featurei in the slot and p(stereotype) is the probability that the user belongs to the stereotype (the probability is computed using the "Profile" slots).

The stereotypes in different families produce non-overlapping predictions. On the other hand, the stereotypes in each family are, in general, not exclusive so that there may be a partial match between a user and more than one stereotype. In such a case the predictions have to be merged. In order to do that, we assume that the contributions to the prediction provided by different stereotypes are independent and we then use an additive formula to combine the contributions; e.g., if we have:
      p(featurei = valueij) = X using a stereotype A         p(featurei = valueij) = Y using a stereotype B
then the combined prediction is p(featurei = valueij) =X+(1-X)*Y.
Notice that again a normalization (concerning the different values of each feature) provides the final predictions.
The stereotypes make use only of the initial classificatory data provided by the user. Thus their predictions may be coarse. In particular, as regards the interests and expertise, the stereotypes only make predictions on general subjects, i.e., on high level sections and not on subsections. When no prediction on a subsection is available, then this prediction is initialized with the value associated with its parent section. All these predictions will be refined by the dynamic user modeling rules (section 6).

5. Selecting the information to be presented

In this section we discuss how the user model is related to what has to be presented to the user: (i) which sections/subsections and news have to be shown, at which detail level, and (ii) which advertisements. Before presenting how the selection is performed (section 5.2), we discuss how the structure we imposed to the database allows us to define different strategies for presenting a news item (section 5.1).

5.1 Defining different detail levels for presenting news

We noticed in section 3 that the possibility of defining strategies for presenting news was the main reason for imposing a structure to the news database. Indeed, different detail levels can be obtained as aggregations of the attributes of news. We have defined a partial ordering between such attributes and, on the basis of this order, we have designed the tree in
Figure 2 to represent the aggregations of attributes corresponding to different detail levels in the presentation of news: the root of the tree corresponds to the minimum detail level; moving to a descendant corresponds to increasing the detail by adding the items listed in the descendant. Thus, for example, 2a corresponds to presenting: title, authors, abstract and summarizing graphics (if any); 2b is an alternative to 2a (presenting the full text paper instead of its abstract).



Figure 2 - Levels of detail in the presentation of news.


The selection of the detail level and, in case of alternatives, of the items to be presented, depends on several features of the user model: the user's level of expertise (in each specific (sub)section), her/his receptivity and interests.

5.2 The selection process

The selection of the information to be presented relies on a knowledge base of rules which exploit the user's domain expertise, interests and receptivity described in her/his model. In order to simplify the process, we adopt a modular approach, making use of three different sets of rules and of a heuristic scoring approach; the first two sets of rules ("Scoring" and "Selection" rules) operate on the (sub)sections and news; the rules in the third set on the advertisements. The sets of rules are applied in sequence.

Scoring rules
The first group of rules assigns a score to the (sub)sections in order to decide whether they can be considered for inclusion in the pages to be presented and, if they can, at which detail level they should be presented. These rules are applied for each (sub)section S and use the information about the user's interest and expertise in the topic of S (which is part of the user model). Basically, the rules exploit probability matrices that specify the probability of each detail level for S, given the user's interest and expertise in the topic of S. The probabilities have the following form:
      p(level=i for section S | interest in S=X, expertise in S=Y) = Z
specifying that Z is the probability that the user wants to read the news in S at the detail level i if her/his interest in S is X and her/his expertise in S is Y (X and Y are linguistic values of the features "interest" and "expertise"). For example,
      p(level=4 for section S | interest in S = medium, expertise in S = medium) = 0.7
Notice that the matrices include a value 0 for the detail level: this corresponds to the fact that no information has to be presented.
Since the user model contains the probability distribution for the interest and expertise in each (sub)section S (i.e., it contains the probabilities p(interest in S=X), p(expertise in S=Y), for all S and all linguistic values X and Y), the application of the rules allows the computation of a probability for each detail level of each (sub)section.

Selection rules
This second group of rules uses information about the user's receptivity and the scores computed by the "scoring" rules to make a final decision about the (sub)sections to be presented and about the detail level for each (sub)section. In particular, if the user has a low receptivity, the system may reduce the number of (sub)sections and news and/or the detail level of certain sections to shorten the presentation.

Selection of the advertisements
The advertisements for each page are selected using a third set of rules. The selection depends on the (sub)section/news displayed in the page and on the classification of the user according to the "Life Styles" stereotype family. Indeed the target associated with each advertisement in the database is specified in terms of classes in the "Life Style" family. The advertisements are selected by taking into account the probability that the user belongs to each stereotype (class) in this family. Only the classes that are over a threshold are taken into account and the selection is made considering advertisements for these classes, with frequency proportional to the probabilities. Notice that in this way the pages contain advertisements for multiple targets and the fact that the user selects a specific advertisement can be used for refining the user model (as regards the "Life styles" classification).

6. Dynamic user modeling rules

The user model initialized by the stereotypes may be imprecise (due to the limited amount of data asked to the user) and is generic: in fact, it makes predictions on high-level sections but not on specific subsections or news. The model can be refined (or revised) after monitoring the user's behavior to see which specific news (s)he reads/selects and which ones (s)he does not read or suppresses. Several events are recorded by the system: The actions taken by the user are collected and periodically analyzed by the system (after the user moves to a new section, and at the end of the session, in order to update the user model on the basis of the whole navigation history). In other words, it is not a single action that leads to modifying the user model but rather an analysis of the user's behavior across time.
The knowledge-base for modifying the user model is a set of rules with the following format: the antecedents are formed by logical conditions on events and the consequents specify new predictions (i.e. new probability values) over some user features. We have different groups of rules for different features in the user model (considering different sets of events); in other words we again pursue the idea of keeping the different dimensions of our user models separated.
As an example, let us consider the rules concerning the interests. The interest in a (sub)section has to be updated if in most of the cases the user selects pieces of information at a level that is more (or less) detailed than that predicted by the system. For each detail level, we have a set of rules that are activated when the level is frequently selected by the user (i.e., at least in the 60% of the cases) and make predictions on the probability distribution associated with the linguistic values of the "interest" feature (the rules are then applied in the context of a specific (sub)section). The general pattern of the rules is the following:

    if     in section X the user selected links at level L in at least 60% of the cases
            and in most of the other cases the user selected links at a level higher/lower than L'
    then the user's interest for section X is M;

Each rule exploits an array M providing a probability distribution for the linguistic values of the user's interest on (sub)section X, i.e.:
        M = (p(null), p(low), p(medium), p(high)).
For instance, the following is one of the rules associated with detail level 4:

    if     in section X the user selected links at level 4 in at least 60% of the cases
            and in most of the other cases the user selected links at a level higher than 3
    then the user's interest for section X is: (p(null)=0; p(low)=0; p(medium)=0.7; p(high)=0.3)

If the user does not modify the structure of the news proposed by the system, no events are recorded and no rules are applied to update her/his model, which we suppose to be a correct one.
We then have different sets of rules that make predictions on the user's expertise, receptivity and life style. As regards the latter, the system monitors the advertisements (banners) visited by the user. If s(he) often clicks on those corresponding to a given target T, then a rule is activated making a prediction on the probability that the user belongs to the class T.
Once a rule is fired, the user's features occurring in the consequent of the rule are updated (the probability distribution of the linguistic values is updated). For each feature, the system evaluates the average between the probability values in the user model and those suggested by the rule. Thus, the changes to the user model are smooth. We made this choice because the events monitored by the system are not certain; we prefer to reduce the impact of new information with respect to the past history, avoiding abrupt changes in the user model. This is a choice and, in a sense, a conservative one; other alternatives can (and will) be explored. Clearly, if the description provided by the user model strongly differs from the user's real features, our choice causes a slow updating process.
The effect of revising the user model is that different (sub)sections, news and advertisements may have to be presented, or that a different detail level has to be used for the news in some (sub)sections. Since changing what is presented during a consultation may confuse the user, the changes to the presentation are effective only to the generation of the pages ((sub)sections and news) that the user has not yet seen during the session.

7. Hypertextual presentation

In this section we sketch the structure of the hypertext for presenting the news, focusing on the features that allow the user modeling component to capture several events during the navigation. The user must have the possibility of changing the presentation choices made by the system (adding or suppressing (sub)sections, news or detail).
The first page of the hypertext contains a list of the highest level sections that are considered of interest to the user. Each section name is a link to the page corresponding to the section. A minimize box is associated with each section and can be used to suppress it. At the bottom of the page, a menu allows the user to explore sections that were not selected by the system. The pages corresponding to sections that are divided into subsections have a similar structure.
The pages corresponding to (sub)sections containing news have a structure similar to that of mail clients. The upper part of the page contains the list of the titles of the news that the system considers relevant (a minimize box can be used to suppress each news item and a menu allows the user to select other news in the (sub)section). The lower part of the page displays the selected news item and is divided into panes, each one containing one of the different pieces of information associated with the item (attributes selected according the appropriate detail level). Each pane has in turn a minimize box. A final pane named "to get more" has an associated maximize box that allows the user to add detail to the news item, selecting (via a menu) attributes not displayed by the system.
Finally, each page contains the appropriate banners with advertisements.

8. Conclusions

We have described the architecture of an adaptive WWW news server, focusing on the user modeling and personalization techniques adopted to customize the presentation of the news users.
The system exploits a (shallow) structured database of news; this proved to be very useful to apply flexible hypermedia techniques for tailoring the presentation of news to the user. This is a significant difference with respect to other approaches to information filtering, which do not assume information is structured but then have more difficulties in the selection of contents, especially as regards the detail level. On the other hand, there are similarities between our approach and those used in some recommender systems (e.g., see [2,12]).
We are implementing the system in a Java-Based environment, exploiting the skeleton architecture of the SETA prototype [2]. The databases of users, news and advertisements are implemented in an NT environment, using JDBC to perform SQL-like queries directly from the Java-Based bulk of the system. Currently, only the components managing the user modeling task and the (personalized) selection of the attributes of news are implemented, while the portion of the system generating the HTML pages to be displayed to the user is under development.

Acknowledgements

The work described in this paper was partially supported by Telecom Italia, project "Sistemi Telematici Adattativi"; Ilaria Torre acknowledges the support of a grant from Forcom and of Babel Srl.

References

  1. Recommender systems. Communications of the ACM, 4(3), 1997.
  2. L. Ardissono, A. Goy, R. Meo, G. Petrone, L. Console, L. Lesmo, C. Simone and P. Torasso. A configurable system for the construction of adaptive virtual stores. To appear on the World Wide Web Journal, Baltzer Scientific Publishers.
  3. M. Balabanovic. Exploring versus exploiting when learning user models for text recommendation. User Modeling and User-Adapted Interaction, 8: pp. 71--102, 1998.
  4. D. Benyon. Adaptive systems: a solution to usability problems. User Modeling and User-Adapted Interaction, 3: pp. 65--87, 1993.
  5. P. Brusilovsky, E. Schwartz, and G. Weber. ELM-ART : An intelligent tutoring system on World Wide Web. In Proc. 3rd Int. Conf. on Intelligent Tutoring Systems, Montreal, 1996.
  6. R.D. Burke, K.J. Hammond, and B.C. Young. The FindMe approach to assisted browsing. IEEE Expert, pp. 32--39, 1997.
  7. L. Calvi and P. De Bra. Proficiency-adapted information browsing and filtering in hypermedia educational systems. User Modeling and User-Adapted Interaction, 7: pp. 257--277, 1997.
  8. J. Fink, A. Kobsa, and A. Nill. Adaptable and adaptive information access for all users, including disabled and the elderly. In Proc. 6th Conf. on User Modeling, pp. 171--173, 1997.
  9. T. Kamba, H. Sakagami, and Y. Koseki. ANATAGONOMY : a personalized newspaper on the World Wide Web. Int. Journal of Human-Computer Studies, 46: pp. 789--803, 1997.
  10. J. Kay. Vive la difference! individualised interaction with users. In Proc. 14th IJCAI, pp. 978--984, 1995.
  11. M.F. McTear. User modelling for adaptive computer systems: a survey of recent developments. Artificial Intelligence Review, 7: pp. 157--184, 1993.
  12. M. Milosavljevic and J. Oberlander. Dynamic hypertext catalogues: Helping users to help themselves. In Proc. the 9th ACM Conf. on Hypertext and Hypermedia, Pittsburgh, 1998.
  13. D.W. Oard. The state of the art in Text Filtering. User Modeling and User-Adapted Interaction, 7: pp. 141--178, 1997.
  14. M. Pazzani, J. Muramatzu, and D. Billsus. Syskill Webert: Identifying interesting Web sites. In Proc. 14th AAAI, 1996.
  15. B. Raskutti, A. Beitz, and B. Ward. A feature-based approach to recommending selections based on past preferences. User Modeling and User-Adapted Interaction, 7: pp. 179--218, 1997.
  16. W. Wahlster and A. Kobsa. User Models in Dialog Systems. Springer Verlag, 1989.