Abstract
MapReduce is a well-known programming model and an implementation method for executing, processing and generating massive data sets. MapReduce algorithm consists of a map function that processes a key/value pair to produce a set of intermediate key/value pairs, and a reduce function which combine all these values related with the same intermediate key.
MapReduce executes in parallel itself without implementing any parallel programming model and it is the most efficient way to process unstructured data.
In this research MapReduce algorithm is implemented on a cluster based machine using Hadoop distributed file system (HDFS) in order to perform a Pattern matching algorithm for different volumes of datasets. The quantitative performance analysis of MapReduce algorithm is done for the different volumes of data on the basis of execution time and number of patterns searched.
So far relational databases are used for storing the data for the applications but now there is need to store huge amount of data to store and manage which cannot stored by relational databases. NoSQL technology over comes this problem. This research paper provides a brief introduction to NoSQL database working and comparative study between MongodB and CouchDB, Which are mostly used for big data application. The operations are performed to explore the results as distinguish between both NoSql databases. This paper shows the performance of Mongodb and CouchDB. Results proves that CouchDB is more powerful than Mongodb to load and process on big data and processing very fast as compare to Mongodb. This paper describes the functionality of Mongodb and CouchDB over the large dataset.