Data cleansing, data cleaning or data scrubbing is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database. We classify data quality problems that are addressed by data cleaning and provide an overview of the main solution approaches. When multiple data sources need to be integrated, e.g., in data warehouses, federated database systems or global web-based information systems, the need for data cleaning increases significantly. This is because the sources often contain redundant data in different representations. In order to provide access to accurate and consistent data, consolidation of different data representations and elimination of duplicate information become necessary.
High-quality data needs to pass a set of quality criteria. Those include:
  • Decleansing
  • Accuracy
  • Completeness
  • Consistency
  • Uniformity
