What is a Modern Data Architecture?
Much has been written recently about Modern Architecture. It’s a very “lively” topic of discussion within our Ecosystem Architecture group and in discussions with our clients. A search on the term “Modern IT Architecture” results in 2+ billion hits. Searching for “Modern Data Architecture” yields 890+ million hits…which helps a lot…problem solved!
There is a lot of debate about what Modern Architecture means and what components or capabilities constitutes such an architecture. However, two terms come up repeatedly in my survey of the literature and in client conversations and proposal requests. Those key concepts are simplicity and flexibility.
In “Ten Characteristics of a Modern Data Warehouse,” Wayne Eckerson lists and describes these characteristics: Customer-Centric, Adaptable, Automated, Smart, Flexible, Collaborative, Governed, Simple, Elastic, Secure (emphasis mine).
In his description of the “Simple” characteristic he writes, “To reduce complexity, organizations should strive to limit data movement and data duplication and advocate for a uniform database platform, data assembly framework, and analytic platform, despite the howls of best-of-breed proponents.” This aligns well with a long time Teradata recommended practice of ‘store once, use many’.
His discussion of the “Flexible” characteristic captures the conundrum of the Modern Data Architecture. He writes, “A modern data architecture needs to be flexible enough to support a multiplicity of business needs. It needs to support multiple types of business users, load operations and refresh rates (e.g. batch, mini-batch, stream), query operations (e.g., create, read, update, delete), deployments (e.g., on premises, public cloud, private cloud, hybrid), data processing engines (e.g., relational, OLAP, MapReduce, SQL, graphing, mapping, programmatic) and pipelines (e.g., data warehouse, data mart, OLAP cubes, visual discovery, real-time operational applications.) A modern data architecture has to be all things to all people.” (emphasis mine).
The challenge of designing for flexibility and simplicity come to a head when considering how to support the development of analytics and most importantly, getting those analytics into production.
Is Analytics my problem?
It’s no surprise to anyone that over the last decade there has been an unprecedented explosion of innovation in tools, techniques and data sources. The pressure to operationalize analytics to drive value has never been higher. However, the “deployment rate” for successfully putting analytics into production has been low with rates less than 50% frequently quoted.
The International Institute for Analytics discusses this issue in their White Paper titled “2019 Analytics Predictions & Priorities.” They quote statistics stating that “35% to 40% of companies that only occasionally or rarely successfully deploy analytical models. We have encountered some organizations that say their successful deployment rates are less than 10%”.
This situation has been an issue for 20+ years. One of my favorite books is “Data Preparation for Data Mining” by Dorian Pyle, published in 1999. Even back in 1999, the author emphasized the importance of, and alluded to, the challenges inherent in getting analytical models into production. He writes:
“…implementing the result is of the first importance to success…implementation usually requires organizational or procedural changes inside an organization…Nonetheless, implementation is critical, since without implementing the results there can be no success.”
Success or failure in the Analytics development lifecycle is to a great extent a data problem. Acquiring and preparing the data has consistently consumed 70%-80% of the time for an analytics project and high percentage of the deployment failure rates occur due to lack of reliable data supply or data pipelines.
Will “Ops” come to the rescue?
I suspect that the relatively low successful deployment rate has been a catalyst for the expansion of CICD (Continuous Integration Continuous Deployment) and variations of “Ops” including DevOps, DataOps, AnalyticOps, and more recently MLOPs and AIOps.
Several “Ops” point solutions are available through open source development and start-up vendors, but they may make the situation worse in the long run. I’m following the development of several of these solutions and they are making great strides in managing the workflow for analytics development but are not yet connecting with enterprise level Modern Data Architecture. This isn’t unexpected. Many of my client discussions around enterprise architecture indicate they are still in the early stages of the transformation from legacy ETL and applications and are still evaluating cloud vendors and technologies.
There are several variations of the diagram below. I’ve drawn a simple version to emphasize the connection between the analytic development side of the “Ops” discussion and the data pipelines required to feed those analytics. The “Big Challenge” I highlight in the diagram below is managing the interdependent Analytics and Data requirements and connecting those requirements to an evolving enterprise Modern Data Architecture.
The pressure on IT is immense. They must maintain legacy ETL and infrastructure while creating an architectural foundation that bridges the goals of Modern Data Architecture (simplification, minimizing technical debt, etc.) while supporting the needs for the ever-increasing demand for analytics.
Critical capabilities in the Modern Data Architecture are platforms that support:
Integrating with Master/Reference Data Management, Catalog and Governance tools
Supporting a Hybrid Multi-Cloud World
Providing highly flexible and tunable resource allocation and workload management
Ingestion of streaming and event data
Virtualizing data access and queries
Teradata Vantage provides capabilities for high volume, fast (short SLA) tactical queries and analytical model support. It has evolved into a Data Management for Analytics platform that supports goals of the Modern Data Architecture. A petting zoo of best in breed or bleeding edge platforms is not the path to a Modern Data Architecture or a successful (i.e., deployed) analytics strategy.
The challenges are immense, and the stakes are high. If analytics is the new competitive battleground and data is the fuel that drives the analytic engine, then the Modern Data Architecture is imperative.