Abstract
In distributed applications data centers process high volume of data in order to process user request. Using SQL analyzer to process user queries is centralized and it is difficult to manage large data sets. Retrieving data from the storage is also difficult. Finally we can’t execute the system in a parallel fashion by distributing data across a large number of machines. Systems that compute SQL analytics over geographically distributed data operate by pulling all data to a central location. This is problematic at large data scales due to expensive transoceanic links. So implement Continuous Hive (CHIVE) that facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism to structure the data and query the data using a SQL-like language called HiveQL and it optimizes query plans to minimize their overall bandwidth consumption. The proposed system optimizes query execution plans and data replication to minimize bandwidth cost.