The Set Theory of Data Warehouse Design


I love volleyball as much as I love data. However, as this is a blog about data, I’ll leave the theories I have related to setting the ball at the door and move forward on the topic of data. Believe it or not, you learned the basics of how a data warehouse works way back…

“Our Work Isn’t an Exact Science”


“Our work isn’t an exact science.” I heard this stated by someone in the data science field last week, and I can think of few things in our work that make me angrier than this – so prepare to hear a rant, as I often find myself on the wrong side of Brandolini’s Law. Simply…

Don’t Know Much About… NULL Values


The Best of Sam Cooke was the first CD I ever bought. One of my favorite songs on that CD, “Wonderful World“, begins with the iconic line “Don’t know much about history.” Through the rest of the song, Cooke sings about a number of the other things he “don’t know much about”. He then ends…

Regression to the Mean Machine


Old habits die hard. That, in a nutshell, is the concept behind regression to the mean. To understand this concept, let’s first define what mean, means (and forgive me for sounding like Bill Clinton during the Lewinski affair). Mean is the highfalutin way statisticians say “average”. With regression to the mean, the philosophy is that…

Roll Up Your Sleeves, Not Your Counts


Rolling up your shirtsleeves has long been associated with the idea of working hard. I have zero issues with working hard – but you should work smart as well. I was reminded of this recently in a discussion with a colleague regarding a table that stores distinct record counts by day. The idea behind the…