Slide 12.1: Top 10 algorithms in data mining

Top 10 Algorithms in Data Mining

Various data mining methods have been created and they are very different from each other. The top 10 algorithms in data mining are given next:

C4.5 and Beyond: C4.5 is an algorithm used to generate a decision tree. C4.5 is an extension of the ID3 algorithm. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referred to as a statistical classifier.
The k-Means Algorithm: The k-means algorithm is a simple iterative method to partition a given dataset into a user-specified number of clusters, k.
SVM (Support Vector Machines): An SVM is machine learning algorithm that analyzes data for classification and regression analysis. SVM is a supervised learning method that looks at data and sorts it into one of two categories. An SVM outputs a map of the sorted data with the margins between the two as far apart as possible.
The Apriori Algorithm: One of the most popular data mining approaches is to find frequent itemsets from a transaction dataset and derive association rules. Finding frequent itemsets is not trivial because of its combinatorial explosion. Once frequent itemsets are obtained, it is straight forward to generate association rules with confidence larger than or equal to a user specified minimum confidence.
The EM Algorithm: Finite mixture distributions provide a flexible and mathematical-based approach to the modeling and clustering of data observed on random phenomena. These mixture models can be fitted by maximum likelihood via the EM (Expectation–Maximization) algorithm.