Data Deduplication in Parallel Mining of Frequent Item sets using MapReduce

Pavithra.K .

Articles

Total : PDF: 63 | Total views: 63

Pavithra.K .,

Article Date Published : 2 May 2016 | Page No.: |

Online Metrics

Abstract

A Parallel Frequent Item sets mining algorithm called FiDoop using MapReduce programming model. FiDoop includes the frequent items ultrametric tree(FIU-tree), in that three MapReduce jobs are applied to complete the mining task. The scalability problem has been addressed bythe implementation of a handful of FP-growth-like parallelFIM algorithms. InFiDoop, the mappers independently and concurrently decompose item sets; the reducers perform combination operationsby constructing small ultrametric trees as well as miningthese trees in parallel. Data Deduplication is one of important data compression method for erasing duplicate copies of repeating data and reduce the amount of storage space and save bandwidth.The technique is used to improve storage space utilization and can also be applied to reduce the number of bytes. The first MapReduce job discovers all frequent items, the second MapReduce job scans the database to generate k-item sets by removing infrequent items, and the third MapReduce job complicated one to constructs k-FIU-tree and mines all frequent k-item sets.

In this paper, we applying Deduplication technique in third MapReduce job to avoid the replication of data in frequent item sets and improve the performance. It produces highly related mining results with less time and increase the storage capacity. Hadoop supports nine different tools, while Mahout is based on core algorithm and classifications. Having sequence algorithm to produce the output in better way. We aim to implement recommendation algorithm using Mahout, a machine learning device, on Hadoop platform to provide a scalable system for processing large data sets efficiently. This can be performed on such platforms for quicker performance.

Downloads

Download data is not yet available.

Comments & Peer Review

Author's Affiliation

Pavithra.K .
Google Scholar

Copyrights & License

Article Details

Issue: Vol. 5 No. 11 (2016)

Page No.:

Section: Articles

DOI:

How to Cite

., P. (2016). Data Deduplication in Parallel Mining of Frequent Item sets using MapReduce. International Journal of Engineering and Computer Science, 5(11). Retrieved from http://ijecs.in/index.php/ijecs/article/view/2849

Download Citation

HTML Viewed - 60 Times
PDF Downloaded - 63 Times

PDF

Downloads

PDF

Data Deduplication in Parallel Mining of Frequent Item sets using MapReduce

Abstract

Downloads

Author's Affiliation

Pavithra.K .

Copyrights & License

Article Details

How to Cite

Download Citation

Downloads

Sections

We Recommend