The Game Theory of Warehouse Design – AKA – “What Would JJ Walker Do?”

In high school, I had the opportunity to play against a chess grandmaster. I can say without boasting that I soundly beat him … to the door. While playing, he told me in no uncertain terms around 40 moves into our game that he would have checkmate in twenty-something moves. As he went around to the other games he was playing simultaneously against my peers, I pored over the chess board, trying to figure out how he knew he could beat me so many moves into the future. Finally, I made my move and was able to conclusively prove him incorrect. I tried what I thought was a bold and unexpected move, and was correct – he revised his estimate after his next move down to  checkmate in 9 moves. He understood the game, and the tendencies of those learning it, far better than I do even today.

I have no doubt I would’ve taken the shirt off of his back had we played Poker, but that’s a story for another day.

I recently attended the IBM Analytics University in New Orleans, LA (thanks boss for sending me!). One of the themes I saw repeated there – especially in my conversations with attendees – was the idea of “Self-Service” User Analytics. In a nutshell, this is the concept that business end users will access the data they need directly in order to conduct analyses of their own, rather than pinging IT with repeated requests to perform mundane report building tasks. At its best, this kind of platform allows your highly specialized IT resources to build the assets needed by the business, rather than act as highly paid Data Gofers for the business. At its worst, it generates frustration with IT and their support model, and causes the business to purchase rogue, non-standard BI tools (or even develop “Analytics” in Excel!). Suffice to say, when the tools go rogue, the data quality often follows suit.

In the 90’s, the phrase “WWJD” – short for “What Would Jesus Do” was enormously popular in Christian circles. Sectarian circles copied and modified this into all sorts of things – even making its way into Spongebob Squarepants (What Would Larry Do?). For a Data Architect, one of the key design questions should be along these lines – “What Would the Business Do?” It’s an especially tough question to answer with people being as different as they are. While not every warehouse is designed for self-service use, there are few reasons why you shouldn’t do so. Let me restate that more clearly, there are NO reasons why you shouldn’t design it for self-service. Some may argue that security would be a reason, but invariably, someone will want to ad-hoc their own reports and/or analytic solutions, and it’s easy enough to secure data by rows, tables, or columns with today’s tools. Further, the design principles that make self-service an option are the same design principles that makes the warehouse useful and efficient for standard report queries, so it’s no additional work in the end.

This gets us to the heart of the issue. As we’ve discussed previously, the grain of the data is one of the first questions to answer. As part of his Top Ten Rules for Data Modeling, Ralph Kimball lists determining the lowest level atomic grain as rule #1.

Rule #1: Load detailed atomic data into dimensional structures.

Dimensional models should be populated with bedrock atomic details to support the unpredictable filtering and grouping required by business user queries. Users typically don’t need to see a single record at a time, but you can’t predict the somewhat arbitrary ways they’ll want to screen and roll up the details. If only summarized data is available, then you’ve already made assumptions about data usage patterns that will cause users to run into a brick wall when they want to dig deeper into the details. Of course, atomic details can be complemented by summary dimensional models that provide performance advantages for common queries of aggregated data, but business users cannot live on summary data alone; they need the gory details to answer their ever-changing questions.

Kimball quickly follows this up with Rule #2.

Rule #2: Structure dimensional models around business processes.

Business processes are the activities performed by your organization; they represent measurement events, like taking an order or billing a customer. Business processes typically capture or generate unique performance metrics associated with each event. These metrics translate into facts, with each business process represented by a single atomic fact table. In addition to single process fact tables, consolidated fact tables are sometimes created that combine metrics from multiple processes into one fact table at a common level of detail. Again, consolidated fact tables are a complement to the detailed single-process fact tables, not a substitute for them.

It is imperative to note that both of the first two rules revolve around structuring your warehouse around what is best for the business. Additionally, it should be designed to perform at the lowest transactional grain, in order to support end user requests for the most granular data possible, as well as rollup summarization from the transactional level details. Think of it like your bank account – if you just look at the balance changes by day, you have no insight into where you are actually spending your money.

Here, dear reader, is where things come full circle. While most of the initial work on game theory was done on zero sum games with specific rules and sets of moves (such as chess), the inspiration for John Von Neumann’s initial work in this area was the game of poker. Von Neumann was fascinated by the concept of the bluff – a facet of the game that no probability theorem could predict. Twenty years after this initial foray into game theory, Von Neumann was at the forefront of digital computing. His basic architecture design, with a single memory storage for both program instructions and data is still used in modern computers today.

However, while his inspiration may have been poker, one of the first games tackled by the digital computer was chess, via a 1951 program written by the father of computer science, Alan Turing. Turing and subsequent programmers attacked chess in part because of its popularity, but also partly due to it being a game of perfect information. Chess is a game where all the information is available to both players at all times. Because of this, it can be more predictable in nature, which is a comfort to programming geeks who thrive on math. Poker, on the other hand, is a game of imperfect information – the obfuscation of the cards other players hold makes the outcome of any given hand nearly unpredictable.

Programmers often retreat to a similar paradigm when designing data assets for their organizations. Rather than program for unpredictable query requests from end users, they tailor each solution to perform in a hyper-specific scenario. You will see this all the time with developers who overuse stored procedures and views. While these tactical solutions can absolutely excel within the realm of the tightly defined rules set by the programmer, they perform far worse when those rules are not followed perfectly.

As I said in the initial opening to this article, I was crushed by a grandmaster in high school playing chess. However, in a less predictable environment, with imperfect information, I have no doubt in my mind that I would have proved myself more adaptable by making the rules of engagement with each hand, rather than obeying the rules of engagement dictated to me. Your end users in the business are exactly the same way. Just as I determine where the flow of a poker game will take me strategically, then choose the tactical analysis to apply to each hand, your business users will follow the flow of where the data takes them, choosing tactical analyses to apply to the dataset and determine a better outcome for their business. You can’t know where the data will take them any more than you can predict in advance how I’ll play a hand of poker – namely because neither the business analyst nor I know what we’ll be doing until we see the data.

If you don’t believe me, just look at this article. How often do you see all of these in a single article:

poker AND “spongebob squarepants” AND “Ralph Kimball” AND “Game Theory” AND Jesus AND “John Von Neumann”

Per google as of this posting, none! If the rambling of this article doesn’t demonstrate how unpredictable people can be with how they get to their conclusion, then clearly I need to work harder at thinking outside the box.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s