The Science Behind Joining to Type 2 Dimensions


When building a type 2 dimension, you’ll assign each record two distinct keys. One will be the natural key, which is the identification of a given dimensional record over one or more points of time. The other is the dimension ID, which is the unique record identifying that exact row on the dimension table – not only for a specific point in time, but for a specific point in time for a specific natural key. The most common example is someone’s last name changing due to marriage. The person’s natural key (for example, their SSN) doesn’t change, but they may choose to change their name. In a type 2 dimension, a record would be created for the name change. This record would have a unique dimension ID, the same natural key as the previous record for that individual, and an effective date/timestamp for when the record became active. Generally, there will also be a current/active flag on the most recent record as well.

So, what does all of this mean when you join to the fact table? Usually, you will want to join on the dimension ID, so as to show the dimension record associated with the fact record at the point in time for which the fact transaction occurred. However, sometimes you may wish to show only the most current dimensional record’s data, but display all of the historical facts associated with it. In this case, you would join using the natural key, and filter the dimensional table to display only the current/active record for that natural key, as joining without that filter criteria will result in duplicative facts (each fact will be repeated for every instance of the natural key).

However, under almost no circumstances would you want to use both methods to join to the fact table. These two join methods are mutually exclusive. Combining them will result in the exclusion of all historical fact and dimensional records for the natural key. The current/active flag on the dimension eliminates the dimensional history, while the dimension ID, as part of its very nature, will filter the fact table and eliminate all facts associated with the natural key but not also associated with the point in time represented by the dimensional ID. In the example of a marital related name change, my wife’s entire credit history didn’t disappear when we married. A report showing her credit history using the dimension ID would show her current and previous last names. A report using the natural key and the current/active indicator would show her full history, but all payments would be associated with her as if her name has always been what it is today. A report showing her credit history using them both would only show her current name and only show her credit history since her name change.

One Comment Add yours

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s