Some years ago, my wife and I had a full fledged argument about how to properly organize my son’s Lego collection. She felt I couldn’t organize my way out of a paper bag, and I felt she couldn’t organize the paper bags (slight exaggeration on both counts). FYI – folding them neatly and placing them into an open paper bag on the floor of your pantry is the way to go in my book! My wife still prefers to stack them on a shelf. Anyhow, she wanted my son to organize every Lego into the set it came with and store them in the boxes accordingly. I felt we should organize them by brick type, which caused her to launch into a diatribe about how it made no sense to put all the red bricks together. How is this relevant to data? I’m glad you asked!
My wife’s preferred method of storage is data normalization and very transactional in nature. Each kit came with a defined set of Lego pieces which were designed to interact in a very specific manner. Given her druthers, she’d store each piece type individually in a ziplocked and labeled bag within the box. Within a kit’s pieces, users would have the option of utilizing the pieces in other ways, which could lead to a cool design or one that was simply lacking. This is no different than the design of a transactional system. It was created and the data assembled within for a very specific purpose – to build out the data elements needed for the transaction the system was designed to create. Changes may occur, but without significant additions to the kit’s pieces, the core of the kit remains the same. Transactional systems are similar in that regard. Another similarity is the ingress/egress paradigm. A Lego kit can be easily broken down and loaded back into its box, but extracting the pieces out of it to put it all together is a time consuming process. When you take a flat file and load it into the transactional system, the load is certainly easier in most cases than in trying to reassemble the flat file from the tables you’ve loaded it into.
My preferred method was much more of a data warehouse design. In my mind, I’d broken apart all of the kits in favor of a storage mechanism that led to ease of ingress and egress. Like any data architect worth a darn, I reasoned that a little extra time loading the pieces into the preferred storage format in exchange for ease upon the end users pulling the pieces back out would be the most time efficient, as typically we build more than we break down. I wanted to store the pieces by brick size – for example, 1*1 bricks in one bin, 2*1 in the next, 2*2 in the next, etc. I didn’t care what color they were, or even how thick they were – those were easy visual cues that to me functioned very similar to an index or an organize by statement in the design. By having them ordered this way, I could use the parts from the Batmobile and the parts from my space kits to design a bat rocket… I mean, my son could do that. We could build anything we had the pieces for, regardless of how we originally acquired them. With the instructions, we could also build the original kits as well. Like a data warehouse, I’ve combined parts from myriad source systems (Lego kits) into a more organized system, and for fun, I even built out a junk dimension to house specialty pieces like wheels.
Who won? Neither of us. My son opted for an uncurated data lake and just dumped everything into one bin. While a terrible way to treat data, on the plus side, his room is cleaner!