We covered Type 1 Data in a previous article. So now that you know what Type 1 data is, what is Type 2? Also known as a slowly changing dimension (SCD), Type 2 dimensions simply are dimensional records that represent a unique value at a specific and unique point in time. The key advantage to this is that your referential integrity for historical records remains perfect. To maintain maximum flexibility, you keep both the natural key and insert a new key – usually named something like the Dimensional ID. The DIM ID is unique for every row, while the natural key groups like records together so you can see how a specific dimensional value changes over time. In the example below, you will be able to see that the Slim Jim offices moved to a new location as of 11/01/2010, and that is the location of the current office.
|Dim ID||Key||Customer Name||Address||City||State||Zip||Start Date||End Date||Active Flag|
|1||1||Slim Jim||505 Tasty St.||Delicious||Texas||12345||01/01/2000||10/31/2010||N|
|2||1||Slim Jim||123 Snack Ave.||Yum||Arkansas||54321||11/01/2010||12/31/9999||Y|
You know that this is the same Slim Jim customer (as opposed to another company of the same name) because the natural key is consistent between both records. Most warehouses will be designed with just the DIM ID on the fact tables, as that keeps the fact table slim and performance excellent. However, if the natural key is also stored on the fact table, you can treat this type 2 dimension as a type 1 at any time. Simply join to the fact table by the natural key, and filter for the Active Flag = ‘Y’. This will always give you the most current dimensional data, but will still match to the historical fact records through the connection on the natural key, which never changes.
Never join on both the DIM ID and the combination of the natural key and the Active Flag. The reasons for this are covered in The Science Behind Joining to Type 2 Dimensions.
4 Comments Add yours