Slide 3: Sequential pattern mining: Example I

Slide 2: Sequential pattern mining: algorithm
Slide 4: Sequential pattern mining: Example I (cont.)
Home

Sequential Pattern Mining: Example I

Given the transaction database with three attributes customer-id, transaction-time, and purchased-items, the mining process were decomposed into the following five phases.

1. Sort Phase
The original transaction database is sorted with customer-id as the major key and transaction time as the minor key, the result is set of customer sequences. The table shows the sorted transaction data.

Customer-id	Transaction-time	Purchased-items
1 1	Oct 23 ’02 Oct 28 ’02	30 90
2 2 2	Oct 18 ’02 Oct 21 ’02 Oct 27 ’02	10, 20 30 40, 60, 70
3	Oct 15 ’02	30, 50, 70
4 4 4	Oct 08 ’02 Oct 16 ’02 Oct 25 ’02	30 40, 70 90
5	Oct 20 ’02	90

2. L-Itemsets Phase
The sorted database is scanned to obtain large (frequent) itemsets according to the predefined support threshold. In order to find the large itemsets, list all itemsets such as{30},{30, 50},{50, 70},{30, 50, 70}, etc. from the above sorted transaction data.

Suppose the minimal support is 40%, in this case the minimal support count is 2, the result of large itemsets is listed in table. For example,

Large Itemsets	Mapped to
(30)	1
(40)	2
(70)	3
(40, 70)	4
(90)	5

The itemset {30} is a large itemset because its number of appearance (in Customer IDs 1, 2, 3, and 4) is 4/5 ≥ 40%.

The itemset {40, 70} is a large itemset because its number of appearance (in Customer IDs 2 and 4) is 2/5 ≥ 40%.

The itemset {50} is NOT a large itemset because its number of appearance (in Customer ID 3) is 1/5 < 40%.

The itemset {60, 70} is NOT a large itemset because its number of appearance (in Customer ID 2) is 1/5 < 40%.