Abstract
Cluster is a gathering of information individuals having comparable qualities. The procedure of setting up a connection or getting data from crude information by performing a few operations on the information set like grouping is known as information mining. Information gathered in reasonable situations is usually totally arbitrary and unstructured. Consequently, there is dependably a requirement for examination of unstructured information sets to determine important data. This is the place unsupervised calculations come into picture to prepare unstructured or even semi organized information sets by resultant. K-Means Clustering is one such method used to give a structure to unstructured information so that significant data can be separated. Discusses the implementation of the K-Means Clustering Algorithm over a distributed environment using Apache Hadoop. The key to the implementation of the K-Means Algorithm is the design of the Mapper and Reducer routines which has been discussed in the later part of the paper. The steps involved in the execution of the K-Means Algorithm has also been described and this based on a small scale implementation of the K-Means Clustering Algorithm on an experimental setup to serve as a guide for practical implementations.