I have often been accused of putting everything through the lens of data and analytics. Many of my long-time followers know of my fondness for finding quotes or pithy sayings from movies, books, and everyday life that reinforce the message of leveraging data to drive a better outcome.
One such idea is from the book Robopocalypse by Daniel H. Wilson. The gist of the story is that a computer achieves intelligence and quickly determines humans are the problem with the world, and you may have guessed, fun and hilarity do not ensue. The part I love is the idea that things do not matter; it is the relationship between things that is fascinating. This is a tailor-made line for my career as I have worked with customers on the importance of the relationship between data to drive outcomes for decades.
The best part is that the more relationships you can find and associate into your analytics, the more value you get from your data.
I have often heard how data is “bad”, data is “late”, or we have “dirty data”. What I constantly have to ask is “how do you know that”? Much like the cat in the Schrödinger's infamous thought experiment, data has no state until it is observed with other context. The cat in the box is neither dead nor alive until you observe it. This is the big point that must be understood in our world – data by itself is neither good nor bad, timely or late, clean nor dirty.
Data just “is” until an external point of view is applied to give it context. Even then, data has different characteristics depending on whose point of view is being considered. For example, Mario may think getting hourly data is good enough for his process but Daphne needs to have the data every minute which would make the hourly data “too late”.
This is not to imply data is not important and should be ignored. It underscores the fallacy that just collecting data for the sake of collecting data is not a winning strategy.
Giving Context to Content
What brings value to the data is the relationship it has with other data elements or to the situation at hand. People shop at stores, that have items supplied by vendors. Patients go to hospitals to be treated by doctors for diseases. Companies ship customer packages on planes or trucks. These very simple examples of how things relate to each other open up a vast array of questions that can be asked so companies can get answers on how their business is running, and how it can be improved.
Using the first retailing example, an analyst can start asking:
“Which customers spend the most at our stores on a per visit basis?”
“Which vendors have the highest rate of return for their products?”
“Which items attract the high end clientele and will they spend more if the product is in stock?”
As one can see, there is no end to the types and number of questions that can be asked yet there is one thing remains consistent and that is the data itself. The trick is making sure that the relationships are preserved in the data models in a way that these questions can be asked.
The best part is that the more relationships you can find and associate into your analytics, the more value you get from your data. As Michael Porter states in Harvard Business Review “This new product data is valuable by itself, yet its value increases exponentially when it is integrated with other data, such as service histories, inventory locations, commodity prices, and traffic patterns.”
Unfortunately, too often people take a single view of data and assume the questions, and therefore answers, into their solutions. Another common mistake is to think just because the data is in a “data lake” it does not need to be modeled or managed and so the data relationships are not preserved. These are just two ways companies ignore data relationships and end up with complex and costly silos. These silos only end up creating confusion and stagnating innovation.
Vantage - The Total Picture
A cautionary point here, do not confuse data relationship with data placement. For example, an insurance company may store car and driver behavior in Hadoop or S3 for low cost but they maintain the Vehicle Identification Number (VIN). The VIN provides the relationship to know which insured client owns that car and all the insured information would be in the relational database. Without the VIN, properly kept in both locations, you cannot connect the car data to the insured during your analytic processes.
The power of Teradata Vantage is the ability to store and manage data as the value dictates and still access and integrate the data as the business needs demand. This powerful combination allows companies to quickly adapt and adopt their analytics to drive meaningful answers that can truly improve their own relationships with their customers!
Since 1987, Rob has contributed to virtually every aspect of the data warehouse and analytical arenas. Rob’s work has been dedicated to helping companies become data-driven and turn business insights into action. Currently, Rob works to help companies not only create the foundation but also incorporate the principles of a modern data architecture into their overall analytical processes.