Analysis and Evaluation of Techniques for Managing Unstructured and Semi-Structured Data in a MapReduce Platform

Dina Darwish

Articles

Total : PDF: 61 | Total views: 61

Dina Darwish,

Article Date Published : 1 February 2017 | Page No.: |

Online Metrics

Abstract

The increasing demand for large-size data mining and data analysis applications drives both industry and academia to create new types of highly scalable data-intensive computing platforms. MapReduce is one of the most popular platforms in which the dataflow is in the form of a directed acyclic graph of operators. This paper presents a modified version of the MapReduce framework that is developed to manage unstructured and semi-structured data. Since, almost most kinds of database systems are designed to manage well-structured data requiring users to design a schema before storing and querying data. However, there are significant amount of unstructured data and semistructured data that cannot be effectively managed this way. In this paper, we develop the engineering principles and practices to manage unstructured and semi-structured data in a MapReduce platform. Having a single data platform for managing both well-structured data, unstructured and semi-structured data is beneficial to users; this approach reduces significantly integration, migration, development, maintenance, and operational issues. The Hadoop environment is used to write SQL/XML schemas first, then, all commands are translated to Hadoop as MapReduce jobs. The efficiency of using this method in MapReduce software is discussed and evaluated.

Downloads

Download data is not yet available.

Comments & Peer Review

Author's Affiliation

Dina Darwish
Google Scholar

Copyrights & License

Article Details

Issue: Vol. 6 No. 2 (2017)

Page No.:

Section: Articles

DOI:

How to Cite

Darwish, D. (2017). Analysis and Evaluation of Techniques for Managing Unstructured and Semi-Structured Data in a MapReduce Platform. International Journal of Engineering and Computer Science, 6(2). Retrieved from http://ijecs.in/index.php/ijecs/article/view/2394

Download Citation

HTML Viewed - 57 Times
PDF Downloaded - 61 Times

PDF

Downloads

PDF

Analysis and Evaluation of Techniques for Managing Unstructured and Semi-Structured Data in a MapReduce Platform

Abstract

Downloads

Author's Affiliation

Dina Darwish

Copyrights & License

Article Details

How to Cite

Download Citation

Downloads

Sections

We Recommend