In our daily work, we’re constantly bombarded with requests for short time periods – rolling 30 days, last month, last quarter, year to date, rolling 13 months, etc. All of these are fine for their defined purposes – reporting. It’s often called Business Intelligence, or Analytics, and there are some pieces of both within these requests. However, the best BI usually looks deeper, and at far longer time periods than your standard reports.
Testing Your Hypothesis
In elementary school science, we were always told that to test a hypothesis, we needed to conduct at least 30 experiments and record the results. The reader may ask “why 30?” – I certainly did. The answer is partially because 30 is a nice round number that’s easy for kids to remember and will keep them busy long enough for a teacher to remain sane, and partly a bunch of statistical math that quite frankly bores me too much to attempt to explain (in other words, it makes me cry and curl into the fetal position when I try to understand it). There’s a far simpler way to explain it – it’s the principle of regression to the mean. Essentially, it means that given enough data points, outliers will eventually average themselves out. Every football season, Bill Barnwell does a great job of explaining it in football terms.
More Data Points = Better Analysis
Tens of thousands of MLB players have gone 4/4 in a game. Equally, Tens of thousands have gone 0/4 in a game. Less than 100 ended their careers with perfect batting averages, and none did so with more than the 3 career hits of John Paciorek. Now, if you just look at career batting averages, he looks like the greatest player in MLB history – but as we’ve cautioned before, beware of averages as a metric. Baseball is a shining example of how even an extended sample size won’t necessarily provide clarity – people are constantly debating the best players. However, I don’t know that anyone would debate that Babe Ruth wasn’t a better player than Mario Mendoza. While it may not provide absolute certainty, the additional data points do give you the confidence to make a conclusion.
A Personal Example
Early in my career, I was asked to create a series of reports to try and show actual vs. goal numbers for an E&P company that mined natural gas in the Southern US and the Gulf Coast. They were having an issue with their Q4 numbers not meeting their goals year after year, and there was pressure from upper management to figure out why since they far exceeded their goals in the first two quarters year after year. Their request of me was to provide a dashboard with a rolling 13 months, so they could figure this out. Since it was available and I wanted to see how efficient I could query a larger data set, I pulled ten years of daily data for thousands of wells into a Cognos PowerPlay cube. I ended up getting so interested in the data, I worked all weekend. I kept noticing two things – first, production dropped like a rock the week of Thanksgiving – and it absolutely tanked the last week of the year. Second, at random intervals in September and October, production would go to near zero across large swaths of the map.
Both had obvious answers, and by accessing employee time records and weather data in my work over the weekend, I was able to conclusively prove both theories to be correct. Quite simply, production fell off in Q4 because of the combination of late season hurricanes and employee vacations. These caused most production facilities to shut down for 12-15 of the 92 days in the fourth quarter every year. Since the goal number for the quarter was simply the annual goal divided by 4 (for each quarter), no provision was made for seasonal occurrences. I wasn’t smarter than the business – I simply pulled enough data points to give me the answer that the business never had enough data to consider. As the analysis focused on a quarter time period, quarter was the grain for analysis. I simply needed enough quarters in my data set to make my trend apparent. Moreover, I needed enough of the specific time period I was trying to analyze.
So, what does this all mean?
In a nutshell, if you want to figure out what is going to happen 1 year, 5 years, 10 years down the line – pull at least that much historical data to see what catalysts made dramatic changes to the general trends. The grain of your projection should match the grain of your historical data pull, and you should pull as much of that data as you can.