What’s in a Name (Calculation)?


One of my favorite lines from Shakespeare is this iconic quote from Romeo and Juliet. Due to the feud between their families, the young couple is forced to debate the merits of the name each of them carries in regards to the rest of the attributes they each bring to the table. Before the reader wonders if they’ve wandered into another blog entirely, I’ll get to the point. Of of my biggest pet peeves with data warehousing design is when the designer fails to truly architect it. Instead of filling in the details their end users will likely want to see, they simply assign the source columns into new dimensionalized tables. Nowhere is this more apparent than with the customer dimension.

Within this dimension, you will often find an assortment of name fields, such as first name, middle name, last name, and surname, which are four of the most common. You will often see these directly imported from an OLTP system into the table as-is. While this is fine, it is also falling short of where the end users need for the warehouse to be, and here’s why. Nearly every report that displays one of those fields will display all of them. Most often, it will show them in a single reporting column via a concatenation function within the report. While this works, it is not ideal. Frankly, there is zero reason to go through the effort of concatenating such an often used field for every single report execution that would use it. Worse, because these fields are often NULL (for valid reasons), the concatenation calculation must also include NVL logic, as a concatenation calculation that includes a NULL will generate a NULL value in most DBs and query tools.  You will also need to add spaces between each value, and further, you will need a case/when statement on middle name for those cases when the middle name does not exist, as you don’t want to insert two spaces between first and last name. Taking all of this into consideration, your SQL statement to create this single field within the report will look something like the following:

Select TRIM(NVL(first_name, ”)||’ ‘||(case when (middle_name is not NULL) then (middle_name||’ ‘) else (”) END)||NVL(last_name, ”)||’ ‘||NVL(surname, ”)) as full_name from DIM_CUSTOMER.

That’s a total of 6 concatenations, 1 trim, three NVLs, and one case/when, for a total of 11 calculations for each row of data.  Multiply that over 3 million customers, and you’re looking at 33 million calculations in the DB’s temp space – for every report or SQL execution. Conversely, if you use the same SQL to insert this into a new column, your report SQL will look something like:

Select full_name from DIM_CUSTOMER

Which do you think will be more efficient?

One Comment Add yours

  1. Vasu says:

    James, highly essential information in this post. Strongly agree!!

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s