Table of Contents
In this article you will learn about Frequent Item set in Data set (Association Rule Mining).
Association Mining searches for frequent items in the data-set. In frequent mining usually the interesting associations and correlations between item sets in transactional and relational databases are found.
In short, Frequent Mining shows which items appear together in a transaction or relation.
Need of Association Mining:
Frequent mining is the generation of association rules from a Transactional Dataset. If there are 2 items X and Y purchased frequently then it’s good to put them together in stores or provide some discount offer on one item on purchase of other items. This can really increase sales.
For example, It is likely to find that if a customer buys Milk and bread he/she also buys Butter.
So the association rule is [‘milk] ∪ [‘bread’]=>[‘butter’]. So seller can suggest the customer to buy butter if he/she buys Milk and Bread.
Now, Let discuss about some important definitions
What is Support?
It is one of the measures of interestingness. This tells about the usefulness and certainty of rules. 5% Support means a total of 5% of transactions in the database follow the rule.
Support(A -> B) = Support_count(A ∪ B)
What is Confidence?
A confidence of 60% means that 60% of the customers who purchased milk and bread also bought butter.
Confidence(A -> B) = Support_count(A ∪ B) / Support_count(A)
If a rule satisfies both minimum support and minimum confidence, it is a strong rule.
Support_count(X) : Number of transactions in which X appears. If X is A union B then it is the number of transactions in which A and B both are present.
What is Maximal Itemset ?
An itemset is maximal frequent if none of its supersets are frequent.
What is Closed Itemset?
An itemset is closed if none of its immediate supersets have same support count same as Itemset.
What is K-Itemset?
Itemset which contains K items is a K-itemset. So it can be said that an itemset is frequent if the corresponding support count is greater than minimum support count.
Example On finding Frequent Itemsets –
Consider the given dataset with given transactions.
TRANSCATION_ID | ITEMS |
---|---|
A | {A,B,D} |
B | {B,C,D} |
C | {A,B,C,D} |
D | {B,D} |
E | {A,B,C,D} |
- Lets say minimum support count is 3
- Relation hold is maximal frequent => closed => frequent
1-frequent:
{A} = 3; // not closed due to {A, C} and not maximal
{B} = 4; // not closed due to {B, D} and no maximal
{C} = 4; // not closed due to {C, D} not maximal
{D} = 5; // closed item-set since not immediate super-set has same count. Not maximal2-frequent:
{A, B} = 2 // not frequent because support count < minimum support count so ignore
{A, C} = 3 // not closed due to {A, C, D}
{A, D} = 3 // not closed due to {A, C, D}
{B, C} = 3 // not closed due to {B, C, D}
{B, D} = 4 // closed but not maximal due to {B, C, D}
{C, D} = 4 // closed but not maximal due to {B, C, D}3-frequent:
{A, B, C} = 2 // ignore not frequent because support count < minimum support count
{A, B, D} = 2 // ignore not frequent because support count < minimum support count
{A, C, D} = 3 // maximal frequent
{B, C, D} = 3 // maximal frequent4-frequent:
{A, B, C, D} = 2 //ignore not frequent
</