The "Fish-Search"

The "fish-search" is a search algorithm based on the schools of fish metaphor.

There are three parameters:

The algorithm uses a list, called queue, which holds the links (URL's) still to be followed. The algorithm works as follows:
  1. Initially the queue is empty and there is a current node.
  2. The current node is parsed to check whether it contains relevant information and to extract the outgoing links.
  3. A number of these links, determined by width, are selected in the following way:
    1. In case the current node is relevant these links are marked as child of a relevant node and added to the front of the queue, meaning they will be used first. If the current node is not relevant the links are added to the queue, right after the last node that was marked as child of a relevant node.
    2. The links that are not selected are added to the end of the queue.
    3. The selection of links is not entirely random: links pointing to nodes in other sites are preferred, because following them generates a better distribution of nodes in the distributed hypertext.
    4. The number of selected links is not simply equal to the value of width. For a relevant node more links than width are considered. For a node which takes a long time to retrieve, fewer links than width are selected in order to avoid spending a lot of time retrieving nodes from remote sites with a slow or erratic connection. Also, links to nodes on sites with a connection faster than rate are preferred over links to nodes on sites with a slower connection.
    5. The links in the queue have an associated depth. Each time a link is used, and the retrieved node is judged not to be relevant, its embedded links, to be added to the queue, get a lower depth (previous depth - 1). When a link has depth 0, the URL's embedded in the retrieved node are discarded unless the retrieved node is relevant. When a relevant node is found, the embedded links get the depth specified by the depth parameter.
    6. Links (URL's actually) that already appear in the queue are not entered a second time. If the new link should appear before the old one the old one is deleted, else the new link is not added. The depth of the link becomes the maximum of the two depths.
  4. The queue is searched to find the first link to a node which is located on a different site than the current node. If this link is among the first 3 x width in the queue, it is removed from the queue and the node is retrieved. If not, the first link in the queue is removed and that node is retrieved. The transfer-rate for this retrieval is measured and the average rate for this site is updated. The newly retrieved node becomes the current node, and the algorithm returns to step 2.
  5. The algorithm stops when a specified amount of time has passed or when the queue is empty.