Sequential Pattern Mining: Example I (Cont.)


3. Transformation Phase
The customer sequences are replaced by those large itemsets they contain. All the large itemsets are mapped into a series of integers to make the mining more efficient.
Large Itemsets Mapped to
(30) 1
(40) 2
(70) 3
(40, 70) 4
(90) 5

For example, the transformation of the customer sequence of Customer 2:
Customer-id Customer Sequence Transformed DB After Mapping
1 ⟨ (30) (90) ⟩ ⟨ { (30) } { (90) } ⟩ ⟨ {1} {5} ⟩
2 ⟨ (10, 20) (30) (40, 60, 70) ⟩ ⟨ { (30) } { (40) (70) (40, 70) } ⟩ ⟨ {1} {2, 3, 4} ⟩
3 ⟨ (30, 50, 70) ⟩ ⟨ { (30) (70) } ⟩ ⟨ {1, 3} ⟩
4 ⟨ (30) (40, 70) (90) ⟩ ⟨ { (30) } { (40) (70) (40, 70) } { (90) } ⟩ ⟨ {1} {2, 3, 4} {5} ⟩
5 ⟨ (90) ⟩ ⟨ { (90) } ⟩ ⟨ {5} ⟩

The last two steps will be detailed in the Example II and the next slide will explain how the sequential patterns are found.

4. Sequence Phase
Use the set of large itemsets to find the desired sequences. All frequent sequential patterns are generated from the transformed sequential database.

5. Maximal Phase
Find the maximal sequences among the set of large sequences. Those sequential patterns that are contained in other super sequential patterns are pruned, since we are only interested in maximum sequential patterns.




      I remember in the old days when people used to    
      get mad if you read their diary.    
      Now people put everything online and get mad    
      when you don’t read it.