Data Correction and Cleansing Mechanisms
Posted on 04. Jun, 2009
Categories: Consulting, Data Issues, Enterprise Systems
I recently read Jim Harris’ excellent series on cleaning up duplicate data after speaking with him. I have certainly come across more than my fair share of data issues in my day. (What consultant hasn’t?)
Jim’s series and his blog focus on three things:
-
People
-
Process
- Technology
It’s both really hard and pretty naive to look at fixing enterprise data in a vacuum. So, let’s consider all three together.
It is often the role of the consultant to identify potential or probable duplicates during a project. Whether using any number of specialized data cleanup tools or stalwarts such as Microsoft Access or Excel, I have found that it’s typically not terribly difficult to identify potential duplicates–i.e., questionable records. The key word here is “potential”, as many records need to be manually examined in order to consolidate, purge, or retire.
However, identification is simply the first step in the process–and often the easiest. After isolating suspect records, they must be investigated and ultimately fixed. Here’s where it’s usually a good idea to stop using phrases such as “not terribly difficult.”
Some people become defensive when presented with data errors. Generally speaking, I try to say very innocently that “someone may have done something wrong.” I find that it’s much less confrontational than pointing a finger. Often, end-users are quick to plead ignorance or blame predecessors for mistakes. In the event that they themselves have made the mistakes (audit trails are pretty hard to dispute), the tone of the conversation is quite different. There’s usually a reason that an end-user did what s/he did.
It’s the client’s role to ultimately make the final call on what to do with suspect records. Far too often, however, end-users do not have the time, desire, or skill set to make these calls. (See my post last month on the different focuses on consultants and end-users.) Failure to address data issues in a timely manner typically causes many problems, from cascaded delays on other project tasks to incomplete testing.
Conclusion
Sometimes on IT projects vendors during the sales cycle (and project managers during the engagement) underestimate the amount of time required to clean up key enterprise information. Technology helps in conducting this imperative exercisebut is no panacea for sloppy data that needs to be cleansed.
Related posts:





