Abstract
Big Data is a collection of a huge and complex data that it becomes extremely drab to seize, store, process, reclaim and inspect it with the help of on hand database management tools or traditional data processing techniques. Now a day, data is constantly evolving and becomes a big data. The data is being generated from different sources – undertaking, social media, sensors, digital images, video, audio and clickstreams for domains together with healthcare, retail, energy and utilities. It is intended to scale up from single server to thousands of machines, each offering local computations and storage Big data with 3 V’s: volume, variety and velocity. For processing such big volume of data, variety of data and the data with high velocity and having high storage capacity, we introduced Hadoop which is evolved day by day. We used MapReduce at this point as a programming model. In this paper we worn Incremental MapReduce most extensively used framework for processing big data. To improve the time of processing big data and optimizing data content of big data we applied PageRank and k-means iteratively along with MapReduce. Therefore to process big data incremental MapReduce approach is used. Incremental MapReduce 1) performs key-value pair level incremental processing, 2) supports complicated duplication computation, which is widely used in data mining applications. That means incremental MapReduce processes big data in a less time and stores it in a more optimized form.