Because it’s 2021 and everyone cares about data. This series is aimed at helping everyone (yes, everyone) get accustomed to some of the key concepts of a modern data stack and why it is so important to cover the bases before you start investing in fancy data analytics tools.
We start the series by what we think is the most important challenge of a data driven company: Data Quality. If you’re struggling to rely and trust your data, the dashboard and all the insights generated from your analytics are void.
So What is Data Quality? Why is it Important? How to measure it? And 5 best practises.
Data Quality - as the name suggests - refers to a measure of the health state of your data. Much like in manufacturing or in the service industries, bad data quality = bad business. Let me give you an example to help you understand.
Brian, a data engineer works at a Grocery Delivery service, during the 1st pandemic lockdown the business was thriving. One day, and despite his best efforts “manually” making sure the data pipelines were error free the unforeseeable happened. Data was accidentally duplicated, which resulted into a duplication of order items which in turn led to customers receiving double the quantity they ordered. Brian only knew about it when customers started reporting the issue, and he was a couple dozen orders too late… The company ultimately locked in multiple k$ loss from this incident which was pretty significant given the size of their business.
Unreliable Data can quickly become a massive source of pain and can be detrimental to the business. Think missed opportunities, financial costs, customer dissatisfaction, failure to achieve regulatory compliance, inaccurate decision making etc just to cite a few.
Maybe you can relate to Brian (I know I can) or maybe you’ve been getting on with the help of multiple fixes / testing tools kindly developed by your team of data engineers to detect “recurrent” issues, emphasis on recurrent (OK but definitely not scalable). One thing we can both agree on: if you can’t trust your data to make sound business decisions you’ve got a serious problem.
So how exactly do you know if your data isn't trustworthy? Much like in relationships, you start having problems and ask a lot of unanswered questions: Is my data Fresh? Is my data complete? Are my field values as expected? Why do I have “null” here? Where can I find the data? How was the data computed? etc etc
Now you ask, how can you prevent bad data quality from undermining your business?
Here are 5 best practises
Get in touch in you want to learn more, firstname.lastname@example.org or email@example.com.