Duplicate detection consists in detecting multiple type of representations of a same object, and that for every object represented in a database source. Duplicate detection is relevant in data cleaning and data integration applications and has been studied extensively for relational data describing a single type of object in a single data table. The main aim of the project is to detect the duplicate in the structured data. Proposed system focus on a specific type of error, namely fuzzy duplicates, or duplicates for short name .The problem of detecting duplicate entities that describe the same real-world object is an important data cleansing task, which is  important to improve data quality. The data which stored in a flat relation has  numerous solutions to such type of  problem exist.


Duplicate detection, which is an important subtask of data cleaning, which includes  identifying multiple representations of a same real-world object. Numerous approaches are there for relational and XML data. Their goal is to either on improving the quality of the detected duplicates (effectiveness) or on saving computation time (efficiency)