Maximal Phase Sample Clauses
Maximal Phase. All sequences that are frequent but not maximal are discarded. This can be accomplished easily by deleting all sequences that are contained in another sequence from the set discovered in Step 4. All sequences of length 1 are also left out.
Figure 6.1: A common approach for frequent sequence mining Within this approach, candidate selection from a large sequence of size k is done as follows:
1. Join If two sequences only differ in the last itemset, add that last itemset to the other set.
2. Prune All subsets of size k of all candidate itemsets of size k + 1 should be present in the large sequence set of size k. All candidates failing on this requirement are pruned. An example can be seen in Table 6.3 (6 8) (6 8 9) (6 9) (8 9) (6 9 8) It is clear that a number of steps within this approach are different for the criminal career situation when looking at the different requirements we set forth. The notion of when an itemset is large differs in both situations, occurring either completely in a single transaction as in [2] or occurring in overlapping time frames, per requirement 3. Also, per requirement 3, since the time frame boundaries have no implicit meaning, the notion of time frames is now completely lost, as can be seen in Figure 6.2. Through this loss, Figure 6.2 clearly shows that we have unjustly lost sequences 13 and 24 and gained sequence 46, per requirement 2. Therefore, care must be taken in Step 3 to choose a representation that denotes all possible sequences consisting of the frequent itemsets in Phase 2. Depending on the choices made for the transformation phase, we can either keep or change the sequence phase. The options we chose for our approach are discussed in Section 6.3. The first and fifth phase can be kept, regardless of the requirements or choices made for the other steps.
