Abstract
Hadoop is popular framework for storing and processing big data in cloud computing. Map-Reduce and HDFS are the two major components of Hadoop. Map reduce is programming model for Hadoop and HDFS is a Hadoop distributed file system which is mainly used as storage component for Hadoop framework. Existing Hadoop system has many performance issues like serialization barrier, repetitive merges, portability issues for different interconnects. An effective and efficient I/O capability is also required for Hadoop. As data size is increasing day by day the performance of Hadoop is becoming a critical issue. To handle large dataset needs to improve the performance by modifying existing Hadoop system. The network levitated merge algorithm is used to avoid repetitive merges. A full pipeline is designed to overlap the Hadoop shuffle merge and reduce phase. Hadoop-A i.e. Hadoop Acceleration framework overcomes the portability issue for different interconnects. It also speeds up data movement and also reduces the disk access.