Sequential Pattern Mining: Example I


Given the transaction database with three attributes customer-id, transaction-time, and purchased-items, the mining process were decomposed into the following five phases.

1. Sort Phase
The original transaction database is sorted with customer-id as the major key and transaction time as the minor key, the result is set of customer sequences. The table shows the sorted transaction data.
Customer-id Transaction-time Purchased-items
1
1
Oct 23 ’02
Oct 28 ’02
30
90
2
2
2
Oct 18 ’02
Oct 21 ’02
Oct 27 ’02
10, 20
30
40, 60, 70
3 Oct 15 ’02 30, 50, 70
4
4
4
Oct 08 ’02
Oct 16 ’02
Oct 25 ’02
30
40, 70
90
5 Oct 20 ’02 90

2. L-Itemsets Phase
The sorted database is scanned to obtain large (frequent) itemsets according to the predefined support threshold. In order to find the large itemsets, list all itemsets such as{30},{30, 50},{50, 70},{30, 50, 70}, etc. from the above sorted transaction data.

Suppose the minimal support is 40%, in this case the minimal support count is 2, the result of large itemsets is listed in table. For example,
Large Itemsets Mapped to
(30) 1
(40) 2
(70) 3
(40, 70) 4
(90) 5