Abstract
This paper proposes and evaluates an approach to the parallelization, deployment and management of applications that integrates several emerging technologies for distributed computing. The proposed approach uses the Map Reduce paradigm to parallelize tools and manage their execution, machine virtualization to encapsulate their execution environments and commonly used data sets into flexibly deployable virtual machines. Multi node environment in which one node will act as a gateway client machine can access the cluster through the gateway via REST API. Using this concept we propose a virtual infrastructure gateway that lifts this restriction. Through gateway cloud consumers provide deployment hints on the possible mapping of VMs to physical nodes. Such hints include the collocation and ant collocation of VMs, the existence of potential performance bottlenecks, the presence of underlying hardware features (e.g., high availability), the proximity of certain VMs to data repositories, or any other information that would contribute in a more effective placement of VMs to physical hosting nodes. Oozier will allow REST access for submitting and monitoring jobs. Cloud Computing allows cloud consumers to scale up and down their resource usage based on demand using the Apache Hadoop, using this prototype we are analyzing various techniques for scalability in cloud. It also demonstrates how power-aware policies may reduce the energy consumption of the physical installation