Signature Files


Signature files are based on the idea of the inexact filter: They provide a quick test, which discards many of the nonqualifying items. The qualifying items pass the test; some additional items may also pass it accidentally.

The Method
The documents are stored sequentially in the text file. Their signature (hash-coded bit patterns) are stored in the signature file. When a query arrives, the signature file is scanned and many nonqualifying documents are discarded. The rest are either checked or they are returned to the user as they are.

Superimposed Coding
Each document is divided into “logical blocks,” that is, pieces of text that contain a constant number D of distinct, noncommon words. Each keyword yields a word signature, which is a bit pattern of size F, with m bits set to 1, while the rest are 0. The word signature are ORed together to form the block signature. Block signatures are concatenated, to form the document signature.





      “A man is not called wise because he talks and talks again;    
      but if he is peaceful, loving and fearless then he is in truth called wise.”    
      ― Dhammapada, The Dhammapada: The Sayings of the Buddha