Abstract:

Lately, modern applications like information retrieval, semantic scene classification, music categorization and functional genomics classification highly require multi label classification. A rule mining algorithm apriori is widely used for rule generation. But Apriori is used many times on categorical data, it is seldom used for numerical data. This leads to an idea that with proper data pre-processing, a lot of intangible rules can be derived from such numerical datasets. Since the algorithm will check each and every datasets, we used a simple k-means clustering approach for dividing the processing space of Apriori and thus rules are generated for each cluster. The accuracy of the algorithm is calculated using hamming loss and is presented in the paper. This hybrid algorithm directly aims to find out hidden patterns in huge numerical datasets and make reliable label prediction easier.