Measures for IR-Effectiveness
Different aspects determine the effectiveness of Information Retrieval.
The measures that are used most often are precision and
recall. However, in literature different definitions of
these measures are sometimes found.
-
Precision: The ratio of relevant objects associated with the
descriptor to the total number of retrieved objects. (This
indicates how many of the retrieved objects are relevant.)
-
Recall: The ratio of the number of the relevant objects associated
with the
descriptor to the total number of relevant objects. (This indicates how
many of the relevant objects are retrieved.)
-
Exhaustivity: The degree to which the contents of the objects
are reflected in the index expressions. (This indicates that meaningful
words may not be searchable through the index.)
-
Power: The ratio of a descriptor's specificity to its length.
(How long does a descriptor have to be in order to be precise?)
-
Eliminability: The ability to determine irrelevance of a descriptor
and stop the search. (How easy can one exclude part of the database from
the search without losing relevant nodes.)
-
Clarity: The ability to grasp the intended meaning of the descriptor.
-
Predictability: The ability to predict where relevant descriptors
can be found in the index. (This could also apply to predicting where
relevant nodes can be found in the hyperdocument.)
-
Collocation: The extent to which the relevant index terms are near
each other in the index. (Or to which the relevant nodes are near each other
in the hyperdocument.)
So far no experiments have been conducted to determine the usefulness of
all these measures for hypertext-based Information Retrieval.
Precision and recall have been used most often. Some applications (in law
and in patent searches for instance) require 100% recall,
i.e. no relevant document may be
left out, while other applications like searching on the Web benefit from
high precision. Users do not want to find irrelvant documents, while there
are so many relevant documents that it does not matter that not all of them
are found, because the user has no time to read them all anyway.