Orion’s Belt Is a Huge Waste of Space – But a Data Constellation is the Inverse

I’ve long had a love affair with the stars. Seeing something so timeless and far reaching puts life in perspective. For as long as I can remember, I’ve been able to identify and appreciate a number of constellations as they run across the night sky. My kids have been able to identify Orion for as long as they have been able to talk. The lesson is that while a single star can shed light on your day to day operations (as the sun does with the Earth), humans need to see something in a greater scale in order to spark the imagination.

Data warehousing is no different. If a user has access to a single star schema built around a transaction based fact table, then they will be able to bring the details of their day to day operations to light. Every business needs this capability in order to survive. However, in order to grow, a business also needs strong, creative, analytical minds who will take disparate data sets and find correlation between what previously was seen as unrelated data sets. This may be the single most driving factor behind my love of this work. As a historian, I was trained to research and bring together these independent data sources in order to analyze it prior to forming a conclusion supported by multiple data sets. In my experience, the best BI minds are not those who merely think outside the box, but those who also wonder what the box is made of (and a whole host of other variables). It wouldn’t surprise me if the same type of people were the ones who originally made the connections between the dots in the sky and from those drew out stories for the groups of dots to be remembered by.

Now that you’ve heard the philosophy, let’s talk turkey on how this works. As we have previously discussed, the basic star schema is a series of dimensional data attributes grouped together in related sets that are distinctly keyed and arranged around a transaction based fact table at a specific grain. Now, as you get a handle on how your business works, you will want to expand the warehouse into other areas of analytics, or perhaps you simply wish to create rollup facts for long term performance reasons. In either case, your facts will ideally be able to use several of your previously defined dimensions.

For example, if your first fact table shows sales data, potential dimensions would include date, sales staff, customer, and product. The fact would hold these dimensional keys and metrics such as quantity sold and the dollar amounts associated with the sale. Now, if you’re my kid’s lemonade stand, keeping track of those transactions at the lowest grain is easy, regardless of how many years of data you wish to view. On the other hand, if you are Amazon, you don’t want to pull trillions of transactions every time you want to see how your business has grown over the last decade. In this case, you would want to make a rollup fact table that pre-aggregates the detail sales transactions to a more manageable and query friendly level. This rollup fact table would join to many (if not all) of the same dimensions – though it may join at a higher dimensional level. Perhaps it joins to the Product dimension at Product Line instead of the UPC code.

Now, here is where it gets exciting. Using a standard BI tool, like Cognos, you can set up a self service query tool that pulls from the rollup fact table, and uses the dimensional IDs to allow users to connect the stars via using the shared dimensional keys as filters. Now you can have users get the performance advantages of the rollup fact table while losing none of the detailed transaction analysis available in the detailed fact table. Pretty cool, right? As an added bonus, by sharing the dimensions, you reduce the amount of space taken up by duplicative dimensional data in the warehouse – and you save the ETL processing and maintenance work of loading the duplicate table.

Now it’s time to take a different angle to this analysis. Your HR department has built a star schema detailing a reports to hierarchy, enabling them to track employee movement over time within the organization. As part of their manager success metrics, they look at how leadership manages budget expenses, with those coming in under budget receiving performance bonuses. From an HR and accounting perspective, this makes complete sense. From an organizational perspective, it may actually hinder growth. By taking the Sales Staff dimension and using it as a bridge between the sales fact and the HR hierarchy fact, you can analyze how your sales staff performs under various managers. Perhaps there is a rift between your best sales woman and the new regional VP, demotivating her and causing a large dent in her annual sales. Looking at the HR star schema, you find that the VP cut out raises and bonuses this year in order to maximize his bonus by coming in under budget. Instead of losing your star performer (and all of the sales that she usually makes), perhaps you would be better off moving the VP entirely. That is the power of effective design – and why building stars into constellations is crucial to changing reporting into Analytics.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s