3. Transformation Phase
The customer sequences are replaced by those large itemsets they contain. All the large itemsets are mapped into a series of integers to make the mining more efficient. |
|
(
10, 20)
is dropped because it does not contain any large itemset.
Parentheses ‘(
’ and ‘)
’ instead of ‘{
’ and ‘}
’ are used to avoid the ambiguity of doubly using ‘{
’ and ‘}
’ as shown next.
(
40, 60, 70)
is replaced by the set of itemsets {
(
40)
, (
70)
, (
40, 70)
}
because each itemset is large/frequent.
〈
{
1}
{
2, 3, 4}
〉
.
Customer-id | Customer Sequence | Transformed DB | After Mapping |
---|---|---|---|
1 | ⟨ (30) (90) ⟩ | ⟨ { (30) } { (90) } ⟩ | ⟨ {1} {5} ⟩ |
2 | ⟨ (10, 20) (30) (40, 60, 70) ⟩ | ⟨ { (30) } { (40) (70) (40, 70) } ⟩ | ⟨ {1} {2, 3, 4} ⟩ |
3 | ⟨ (30, 50, 70) ⟩ | ⟨ { (30) (70) } ⟩ | ⟨ {1, 3} ⟩ |
4 | ⟨ (30) (40, 70) (90) ⟩ | ⟨ { (30) } { (40) (70) (40, 70) } { (90) } ⟩ | ⟨ {1} {2, 3, 4} {5} ⟩ |
5 | ⟨ (90) ⟩ | ⟨ { (90) } ⟩ | ⟨ {5} ⟩ |
I remember in the old days when people used to get mad if you read their diary. Now people put everything online and get mad when you don’t read it. |