Inverted Files

A textfile can be considered as an array of lines of text. (Text editors almost always treat textfiles as such.) A textfile is thus a kind of index, giving for each line number the words that appear on that line. In order to find occurrences of words in a (text)file, a different index is built, which is the inverse of the file itself: for each word, a list of linenumbers is stored on which the word appears.

Instead of lines, other units can be used, such as book-entries in a library catalog, or nodes in a hyperdocument.

When looking for more than just a word (say a phrase), one may search for entries containing a word, and then select the entries among the result that contain the whole phrase. The inverted file then works like a hash function, delivering a bucket from which the desired elements have to be selected.

Inverted files are space consuming, and updating them is time consuming. Therefore, only meaningful words are stored, and updates are not always performed immediately, but are executed in batch at night.