How you finish that statement tells a great deal about a person. If you finish with “so help me God”, then you’re likely either a Christian, or spend way too much time watching courtroom dramas. I’m kidding – we all know there’s no such thing as too much Matlock or Perry Mason. Alternatively, if you’re a data expert, you may finish that statement with “so help me Codd”. Edgar F. Codd is known as the father of the relational database model used in applications all over the world today. As for me, I like to talk about all three!
Sadly, today’s article isn’t to regale you with my thoughts on any of the above specifically. Instead, I want to talk about the objective contained within the original oath, and similarly, what drove Codd and the very best of today’s BI resources – that of the probing search for truth. There’s a reason the table at the core of a data warehouse is called the Fact table, and not the opinion table.
In my recent article on regression to the mean, I highlighted the case of the 2017 season of Eric Thames, concluding that his stellar April 2017 numbers were unsustainable, and that his more recent play is more indicative of the player he really is. It was pointed out to me by a reader that in trying to make my point, I sold Thames short a bit. While his batting average has slumped dramatically since his hot start (the key point in the article), his slugging percentage is still well above league average (.505 at the time of this posting). In other words, when I called it a “mere” .506 slugging percentage, as compared to his numbers from April 2017 or previously in Korea, I made it appear to non-baseball folks that this number was subpar overall, when in fact it was merely regression from his previous numbers.
So, what’s the point I’m trying to make in telling you this, rather than just editing the article so future readers won’t know the difference? Simply this – as a BI resource, you MUST ALWAYS tell the truth, the whole truth, and nothing but the truth. It is imperative that you not artificially insert your own biases into the data presentation* in order to obscure of obfuscate data elements or metrics you do not wish to highlight. Further, if it is pointed out to you that you inadvertently misstated the facts in order to make your point elsewhere, you MUST ALWAYS take ownership of said misstatement. It is our duty to present the truth, the whole truth, and nothing but the truth. It’s the job of the business and executives to determine what to do with that truth.
*I am fully ok with using your knowledge to cross reference and insert additional supporting data – for example, if on a previous project I worked with weather data, and feel it is relevant to the current project, inserting that into the dataset to allow for the business to use it would not necessarily be a bad thing.