Data Cleansing & De-duping
What the text books say –
Data cleansing or Data scrubbing is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database. The term refers to identifying incomplete, incorrect, inaccurate, irrelevant parts of the data and then replacing, modifying, or deleting this dirty data.
After cleansing, a data set will be consistent with other similar data sets in the system. The inconsistencies detected or removed may have been originally caused by user entry errors, by corruption in transmission or storage, or by different data dictionary definitions of similar data in different stores.
What Gilding & Company say –
Data cleansing removes stress from your life. Data that is incomplete, irregular or variable is very difficult to interpret and can significantly skew the results of any analysis. In short it can seriously reduce the amount of hairs on your head.
The process can also be extended to removing data which is unnecessary to the task in hand. You may start out with all your sales data, but only require data from one location, or a specific sales person or of a single product. It is important to note that this process is nearly always automated. Rules are defined and code written to complete the task. The days of copy / paste are thankfully behind us.
In many cases a data cleansing exercise is a one off and the resulting data set can then be used for many useful purposes. The old adage “Rubbish in, rubbish out” is as true today as it’s always been, however today we have techniques to recycle your rubbish.
- " Gilding & Co's experience in analysing data gave some great insights into our online users lifestyles. It was a pleasure working with them"Lorraine Adams - Sponsorship & Promotions Manager Delia Online