Abstract
RDBMS can store structured data up to some GB of data. Processing of large data is very difficult to handle and also time consumption process. To overcome these problems made of using Hadoop. Apache Hadoop is a framework for big data management and analysis. The Hadoop core provides storing of structured, unstructured and semi structured data with the Hadoop Distributed File System(HDFS) and a simple MapReduce programming model to process and analyze data in comparable, the data stored in this distributed system. Apache Hive is a data warehouse built on top of Hadoop that allows you to query and manage large sets in scattered storage space using a SQL-like lingo call HiveQL, Hive translate queries into a series of MapReduce jobs. In existing system unstructured data stored in HDFS can’t be retrieve into structured format through HiveQL. In this project It is converting twitter data into a structured format by using HiveQL with SerDe. HDFS can stores twitter data by using data streaming process.