“Our work isn’t an exact science.”
I heard this stated by someone in the data science field last week, and I can think of few things in our work that make me angrier than this – so prepare to hear a rant, as I often find myself on the wrong side of Brandolini’s Law. Simply put, I believe that data is an exact science. Your data is correct, or it is not. If it is not correct, then it is not acceptable, period. In case you disagree, or have not gotten this message previously, here are some of the ways we’ve discussed this in previous articles on this blog.
- Data must be easily queried so as not to confused your end users.
- Data must be as complete as you can make it, and you should not exclude metrics in order to misrepresent it in a manner favorable to your argument.
- It is imperative that you roll up from the smallest transactional grain available.
- Percentages must be coherent and easily understood as independent entities.
- It’s important to understand the difference between current and historical records.
- Planning is key.
If you’ve never taken the time to think about what the phrase “exact science” means, it is simply a science that can be absolutely precise in the results. There is absolutely zero reason why a data copy from one system to another cannot achieve absolute precision. This is one of the beauties of this business – it’s a binary metric. It’s either 100% correct, or it is not. Your work passes or fails, with no in between. 99% is simply not good enough, as we discussed here.
Now, the interpretation of said data is certainly not an exact science, as the very definition of interpretation on its own will tell you that opinion and bias will be a part of any conclusion. Indeed, it is because of this that it is so important to get the source data correct. A few records may very well make the difference between one interpretation of cause and effect or the discernment of what is actually the root cause of said effect. Don’t guarantee mistakes are made in interpretation by allowing mistakes to proliferate within the data.