Abstract
With the advent of the Information Age, organizations are establishing their virtual presence, doing away with static locations and easing into presence on cloud. There is a rise in the volume of data in various industries and distributed storage of this data demands for efficient parallel processing. With the arrival of big data also came the realization that this data can be utilized for intelligent decision making. In addition to the complexity of tools and infrastructures that are required to manage huge volumes of data, there is an urgency to identify and resolve the technologies that can properly take advantage of these volumes. Big data evolution is driven by fast-growing cloud-based applications developed using virtualized technologies. A subsequent development in tools to enable data processing in a distributed environment emerged, leading to the MapReduce framework. In this paper, we will see the various technologies that implement this framework with an emphasis on the components of Apache’s Hadoop Ecosystem: Pig, Hive and JAQL and their uses in data analytics.