Have a case of dirty data? Treat it with a dose of Data Transformation
What great terms – “Dirty” and “Ugly” data. The descriptions are damning, suggesting something that should never be seen, let alone experienced. If the data under discussion is yours, how would you respond?
a) Pretend it’s nothing to do with you?
b) Try to shift the conversation onto another topic? or
c) Walk away and give up on the data altogether?
The answer is “None of the above”. Dirty and ugly are terms used to describe data that has not been cleaned, formatted and enriched for optimal use in reporting. It may contain duplications or data errors that make it impossible to perform accurate queries, or the way the data is structured makes it too cumbersome for reporting using your BI platform. Under these circumstances, queries become slow and reports difficult to generate. It can take hours or days to obtain information that is urgently needed.
The reality is, because we are human and we aren’t perfect, every company has some degree of dirty or ugly data. For some organisations it’s a minor inconvenience. For other companies, poor data can severely limit reporting possibilities. No matter how well-prepared a business is, it’s hard to identify the true extent of any data problems until you’ve deployed your BI platform and start to generate reports.
It normally goes like this… Business users read an article or attend an event and learn what’s possible with analytics. Then they get excited about the opportunities within their own company. They convince the right people and before you know it, an analytics solution is deployed. It’s about this time they realise they have a problem with their data. As they try to untangle the database and cleanse the data, the time investment for the project goes through the roof and frustration sets in.
The solution could require consulting time to get the reports built by an outside expert or employment of additional analysts to take on the job in-house. Alternatively, you could purchase a third-party dedicated data transformation tool to manipulate the data. All of this, slowing down time to insight.
Data transformation tools are extremely useful. They help you extract data from a variety of sources, clean and transform the data into a beautiful data set, and push that data into a database so that it can be used by your reporting platform. In short, data transformation enables smoother reporting. The problem is that until recently, every one of these solutions outlined above involved time, complexity and increased financial investment.
Now it has become a whole lot simpler. Vendors are responding with a variety of options for users.
Yellowfin is one of a handful of platform providers who have acknowledged the problem and have incorporated the latest generation of data transformation tools into their analytics solutions. If you’re an analyst or business user working with Yellowfiin and you find that dirty or ugly data is hampering reporting, you already have the tools in place to solve the problem. No additional expense or software is required and it’s ready to use immediately. The tools have the same user interface that led to BARC’s BI Survey 17 naming Yellowfin as the data-discovery-focused product with the highest value when it comes to ease of use.
Other vendors have adopted a slightly a different path, developing technical solutions best managed by IT departments, while a small number of die-hards are sticking with their existing third party strategies.
The most important thing is to acknowledge that every organisation – including your own – has some degree of bad data but it doesn’t have to be a problem. Before you start on the path of an analytics project, consider how you intend to handle data issues and how you can incorporate data transformation capabilities into the project. Taking a pre-emptive stance and having data transformation capabilities on hand as part of your overall BI platform is a no-brainer and ensures you get to the why faster!