Inverted Indexes (Cont.)


Indexes can be constructed in a way that supports phrase or proximity searching. These allow users to search, for example, for “Search Kit” as a complete phrase, as opposed to searching for documents that contain the terms “search” and “kit” anywhere in their content.

In an index that supports phrase searching, a term’s linear position in a document is recorded along with a reference to the document the term appears in. Searching for a phrase amounts to searching for a series of terms that appear in consecutive order. Another way to reduce index size and increase index quality is to employ a minimum term frequency (MTF) during text extraction.

An indexing system that supports MTF skips over terms that appear in a document fewer than a specified number of times because if a term appears only once in a document, that document is not likely to be a useful source of information on that topic. An information retrieval system that needs to support phrase searches should not exclude words from an index based on term frequency.