Successor Variety
 
Successor variety stemmers are based on work in structural linguistics which attempted to determine word and morpheme boundaries based on the distribution of phonemes in a large body of utterances. 
It is defined as follows:
 - Let α be a word of length n; αi, is a length i prefix of α. 
 
 
 - Let D be the corpus of words. 
 
 
 - Dαi is defined as the subset of D containing those terms whose first i letters match αi exactly. 
 
 
 
The successor variety of αi, denoted Sαi, is then defined as the number of distinct letters that occupy the i+1st position of words in Dαi.
A test word of length n has n successor varieties Sαi, Sα2, ..., Sαn.
  
 
In less formal terms, the successor variety of a string is the number of different characters that follow it in words in some body of text. 
Consider a body of text consisting of the following words, for example.
   able, axle, accident, ape, about.
To determine the successor varieties for “apple,” for example, the following process would be used. 
 - The first letter of apple is ‘a.’ ‘a’ is followed in the text body by four characters: ‘b,’ ‘x,’ ‘c,’ and ‘p.’ 
  Thus, the successor variety of ‘a’ is four. 
 
 
 - The next successor variety for apple would be one, since only ‘e’ follows “ap” in the text body, and so on.
 
 
When this process is carried out using a large body of text, the successor variety of substrings of a term will decrease as more characters are added until a segment boundary is reached. 
At this point, the successor variety will sharply increase. 
This information is used to identify stems.
 
 
  
   
          
     “Lost Time is never found again.”      
           ― Benjamin Franklin, Poor Richard’s Almanack
         
    |