Data Cleaning System to Handle Noisy Data

A.F Elgamal

Abstract

Data cleaning techniques are used for identification of record duplicates, missing data, and duplicate elimination. This paper
presents a data cleaning system, it goes through six steps: selection of attributes, formation of tokens, clustering algorithm, similarity
computation, elimination function, and finally merge step. The system architecture contains three components: users interface, data
cleaning, and reports component where they can communicate and cooperate with each other's. It is implemented using SQL Server 2010
and Microsoft visual c# 2010.

Data Cleaning System to Handle Noisy Data

Abstract

Author Resources

Journal Policies

Author Desk