AN OPTIMIZED APPROACH FOR RECORD DEDUPLICATION USING MBAT ALGORITHM Subi S, Thangam P

Subi S, Thangam P

Articles

Total : PDF: 25 | Total views: 25

Subi S, Thangam P,

Article Date Published : 30 December 2017 | Page No.: |

Online Metrics

Abstract

Record deduplication[1] is the task of identifying, in a data storage, records that refer to the same real entity or any object in spite of spelling mistakes, typing errors, different writing styles or even different schema representations or data types. In the existing system aims at providing Unsupervised Duplication Detection method which can be used to identify and remove the duplicate records from different data storge. UDD, which for a given query, can effectively identify duplicates from the query result records of different web databases. After removing the same source duplicates, the supposed” non duplicate records from the same data storage can be used as training examples alleviating the trouble of users having to manually labeled training examples. Starting from the non duplicate reocord set, the two different classifiers, a Weighted Component Similarity Summing Classifier (WCSS) is used to knowing the duplicate records from the non duplicate record and presently a genetic programming (GP) approach to record deduplication. The approach joins several different pieces of attribute with similarity function extracted from the data content to produce a deduplication function that is able to identify whether two or more entries in a repository are replicas or not. Since record deduplication is a time taking task even for small repositories, the aim is to foster a method that finds a proper combination of the proper pieces of attribute with similarity function, thus yielding a deduplication function that maximizes performance using a small representative portion of the corresponding data for training purposes. But the optimization of result is less . The proposed system has to develop new method, modified bat algorithm for record duplication. The aim behind is to create a flexible and effective method that uses Data Mining algorithms. The system shares many similarities function with generational computation techniques such as Genetic programming approach

Downloads

Download data is not yet available.

Comments & Peer Review

Author's Affiliation

Subi S, Thangam P
Google Scholar

Copyrights & License

Article Details

Issue: Vol. 2 No. 06 (2013)

Page No.:

Section: Articles

DOI:

How to Cite

Thangam P, S. S. (2017). AN OPTIMIZED APPROACH FOR RECORD DEDUPLICATION USING MBAT ALGORITHM Subi S, Thangam P. International Journal of Engineering and Computer Science, 2(06). Retrieved from http://ijecs.in/index.php/ijecs/article/view/1346

Download Citation

HTML Viewed - 43 Times
PDF Downloaded - 25 Times

PDF

Downloads

PDF

AN OPTIMIZED APPROACH FOR RECORD DEDUPLICATION USING MBAT ALGORITHM Subi S, Thangam P

Abstract

Downloads

Author's Affiliation

Subi S, Thangam P

Copyrights & License

Article Details

How to Cite

Download Citation

Downloads

Sections

We Recommend