The Data Mining Forum This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.

In Apriori, the same item cannot appear twice in the same transaction. For example, there cannot be two "1" in the same transaction. Besides, there is no concept of "expected outcome" in the Apriori algorithm and you cannot calculate the support of a table. But you can calculate the support of an itemset.

1) From the given database, the database it is correct because no item is found twice in the same transaction.

2) How to find the rule from the particular database: Item a,b is in transaction 100,200,300 and has a support 3. The remaining itemset is having support less than 3 therefore the "most frequent rule" for this database is a→b.

3) Now how to calculate the support of rule a→b. Support count for a,b is 3 and there are total 5 transactions, the rule support is 3/5= 0.6. Therefore the support for the rule a→b is 0.6.

4) Now how to calculate the confidence of rule a→b. We can get the rule confidence by dividing the support count of ab and by dividing the support count of a, because a appears in transaction 100,200,300,400 and the support count is 4, and the support count for ab is 3. Therefore rule confidence for a→b is 3/4 = 0.75.

By applying an algorithm. I cannot explain the algorithm. It would take too much time. You can read chapter 6 of the book "Introduction to data mining" to understandit. It explains the basic algorithms and how rules are generated:

The support and expected support are measures that are used to evaluate the interestingness of some patterns (itemsets). You could either calculate these measures by hand or by applying an itemset mining algorithm. I don't think you should do that by hand on a large dataset such as Connect4. So you may use some software. The SPMF open-source software for example provides code for various itemset mining algorithms that you can run on the Connect dataset.

The support is the number (or percentage) of transactions where an itemset appear. So if you have a transaction database with five transactions, and an itemset X appear in two of them, its support is 2 transactions (or 20 %).

harsh nagalla Wrote:
-------------------------------------------------------
> A database has five transactions. Let min sup =
> 60% and min conf = 80%.
>
>
> what will be the minimum support if it is given in
> percentage

Re:Find all frequent item sets using Apriori algorithm with min_sup = 60%.including the candidate item sets, frequent item sets, pruning (if any) after each database scan. Please list all the frequent item sets in the end which satisfy the minimum support

I do not know any algorithm that do the average support. it would be possible to do that.

Actually, there are a lot of possibilities for research. You can either create new measures, new optimizations or new algorithm. Or you can combine two topics to create a new topic. For example, you can combine:

This is just an example. Actually, if you are looking for research ideas, you can always combine topics to make new problems. That is what I want to say. But some problems are too easy and not interesting. So you still need to choose something interesting and useful.

Sir,
I have a dataset consisting of 6000 observations and 10 attributes. In each attribute there are approximately 3 sub-attributes. Is there any way to calculate support and confidence for all the 6000 observations taking each observation as a rule.

In my opinion, it makes more sense to round up to 2 (take the ceiling of the number) because we do not want to accept something below the minimum support. If you round to 1 then you will accept patterns that do not satisfy the minimum support. This would not be good.

So this is how it is implemented in the SPMF software. In other data mining software, it may be implemented in some other ways. But I think that rounding up makes the most sense.