Artificial Intelligence is supposed to change the world we live in. Not just with autonomous cars, the metaverse, deep fakes and ChatGPT, but also with helping to diagnose patients better and faster, driving the sustainability transformation, automating business decisions and processes, protecting vulnerable people against fraud or abuse, and more.
Technologies such as Machine Learning, Computer Vision, or Natural Language Processing are becoming mainstream, but for many organizations the underlying data challenges are more painful than ever. Despite many years trying to sort it out, the data is still messy, dirty, and too scattered.
Furthermore, the problem is now exacerbated by cloud adoption, which is too often done in a disorderly way with little thoughts given to the impact on the data landscape and the data governance approach.
TIP: Analytics in the cloud: how valuable is it?
Data governance is an old problem with new urgency
Data governance is not a new issue. Data professionals from all kind of backgrounds have spent the last 20 years trying to deal with it. The idea was to handle data as a corporate asset, in the same way as we handle money, people and infrastructure, all vital parts of running a business. The promise was to create order, rules, clarity, enabling everyone in the organization to harness the value of data and to become “data-driven”. Sounds good, doesn’t it?
But in most cases, that didn’t work as planned. Many of the data challenges we had 20 years ago are still around.
The data landscape is complex, and integrating data is still difficult. Poor data quality leads to poor and untrusted insights, leading to poor decisions and missed opportunities. On top of that, the world of data analytics has also changed. We now have hundreds of vendors and open-source options, giving companies a world of choices, but no recipe for making it all work together.
Worse, as organizations started experimenting with new technologies and cloud environments, they created more complexity, technical debt and silos.
Finally, the human dimension of data governance, the need for data literacy and collaboration, proved more difficult than expected. Some approaches have been somewhat successful, like creating a Centre of Excellence to act as a catalyst and enabler of the broader organization. Or setting up internal data academies to provide the adequate training to employees. But changing the culture of an organization takes time, it’s difficult, and it never really ends, so it’s hard to keep the momentum over time.
Rethinking data governance
Is there a better way?
Reflecting on all these challenges, and the little progress we seem to have made in the data engineering room in all this time, I have come to the conclusion that we’re approaching data governance the wrong way. We try to make it perfect and all-encompassing. We create top-down initiatives with an ambition to govern all data, for all users and all use cases, almost as a pre-requisite to using data to create value.
However, we all know that the only way to eat an elephant is one bite at a time. Let’s rethink data governance with that in mind.
We know that data is a valuable asset, and that data governance is important. However, data is useless until it is used, consumed. The value is only realized when an action has an impact in the real world. Therefore, it is pointless to try to “fix” the entire data foundation. If you think about the value chain running from data through insights to decisions and actions, it is a marathon, and prizes are only given to those who finish the race. A common mistake is to spend a disproportionate amount of time and effort on the first stages of the race: basic data management, reporting and dashboarding activities, which by themselves do not create value. In contrast, too little time and effort is spent on value-add activities, such as developing predictive models, and deploying those models into automated decision-making processes.
Too often, the runner never finishes the race!
To avoid this pitfall, the solution is to start from the finish line and to think of data governance from a ‘consumption-based’ perspective. The idea is to consider the business outcomes and value we want to generate. Then work back. Decide what decisions you need to make to generate those outcomes, and then what insights you need to make those decisions, and what data will give you those insights.
This will help you prioritize your data governance efforts, focusing on the critical data elements that will have the biggest impact on business outcomes. Specifically for those data elements, you can evaluate the data governance needs. Are they well defined? Do we know who owns them and their reference system? What level of data quality needs to be achieved in contrast to the current level? From there, a targeted data governance action plan will ensure the biggest impact on business outcomes and provide justification to sustain this effort over time.
Creating strategic alignment
This process is designed to create strategic alignment across your business and ensure that data governance efforts are focused on those areas where it really matters—all the way from data to business outcomes. By being more deliberate and focused, the runner has a chance to finish the race and to see some of those AI projects becoming reality, with tangible results for the business.
Also read: Knee deep in the data mess: How to finish in the Data Analytics Race