I was first exposed to Hadoop in 2010 when I assumed responsibility for Electronic Arts’ (EA) enterprise data platform, which was a mix of Teradata and Cloudera. Having inherited this Hadoop environment, I quickly came up to speed on the capabilities it offered.
At the time, there was a ton of confusion on what Hadoop was and what it would become. After wading through the conflicting narratives and, more importantly, experiencing Hadoop in active projects, I came to the following conclusions:
Hadoop was excellent at economically harnessing data types that were constantly evolving.
We had a lot of data coming from game telematics that would change constantly. One day the game designers would append a new data element to a game log, and the next day that data element may disappear – only to be replaced by three new ones. That kind of variable volatility would break the kind of design patterns I cut my teeth on. A new age of “big data” required new engineering, and this fit the bill.
Hadoop was poor at managing the core data of an enterprise.
When it comes to managing data in a way that is shared across the enterprise, nothing beats a database – and Hadoop is no database. There was no data type safety and no workload management. There were also performance issues when multiple joins were introduced, a subset of ANSI SQL that restricted usability.
In the case of EA, Hadoop was the right tool for the job. We created a platform that the development team was proud of and that delighted the user community, all at a pivotal time in EA’s digital transformation.
I later joined Teradata, supporting our clients in the Silicon Valley who were going through a similar transformation of figuring out the right mix of technologies. I also got tasked with being the marketing lead for Teradata’s strategic partnerships with Cloudera, Hortonworks, and MapR. We did some work together that was very professionally rewarding, including a trusted advisor webinar that had more than 5,000 attendees, on the topic of Hadoop and the Data Warehouse: When to use Which. We also had some valuable product collaborations, such as QueryGrid, that allowed Teradata clients to execute queries that ran across Teradata and HDFS.
It didn’t take long for these three Hadoop distribution vendors to take a path that would eventually become their demise. Rather than building the market for a technology that was excellent at economically harnessing constantly evolving data, they all took the position that they were a replacement for the data warehouse.
It didn’t take long for these three Hadoop distribution vendors to take a path that would eventually become their demise.
The hard thing about creating a new market is that it’s hard. When Teradata invented the data warehouse market 40 years ago this month, nobody was shopping around for a data warehouse. The organization had to explain the value of this new technology for managing data and performing analytics, which at that time was dominated by mainframe programming. Data integration was a foreign concept to enterprises, as it had never been done before at scale. The sales team had to come up with go-to-market strategies with no precedent to borrow from, such as cultivating an executive sponsor to rally the organization around sharing data and the new use cases this enabled. Early adopters like Teradata’s first customer - Walmart in the 80s - used the power of this new technology to reinvent the retail industry. As a FedEx employee in 1996, I was in the room when our CIO challenged Teradata on why any company would ever need a terabyte of data.
Rather than creating a market around evolving big data types, and helping enterprises learn to derive value from this new data, the Hadoop distros took a more short-sighted approach and positioned themselves as a cheaper alternative to the data warehouse. There was already recognition of the value of a data warehouse with commensurate budget behind it, so it was easy to say, “Hadoop is a cheaper and more flexible alternative to MPP databases.” This led to many failed engagements.
But the final nail for Hadoop was object storage. I hear most people saying “the cloud” was the undoing of Hadoop, and I worry about what people really mean when they say that. The cloud is just a deployment option – servers and software. There are some cloud databases that are entirely inappropriate for managing an enterprise’s core data in a way that promotes reuse and the elimination of silos. But one undeniable engineering innovation born in the cloud (and now available across multiple deployment options) is object storage. Object storage is what was great about Hadoop: cheap storage and support for flexible data types. But even better than Hadoop, object storage is 3X cheaper and it supports the kinds of data types needed for the age of Artificial Intelligence, such as audio, video, and image files.
Teradata will continue to plug into legacy Hadoop clusters through QueryGrid, and already allows clients to seamlessly plug into Object Storage with QueryGrid. So while it may be the end of Hadoop, the era of pervasive data intelligence that enables any data type across any deployment option is just getting started.
Chad Meley is Vice President of Solutions Marketing at Teradata, responsible for Teradata’s Artificial Intelligence, IoT, and CX solutions.
Chad understands trends in machine & deep learning, and leads a team of technology specialists who interpret the needs and expectations of customers while also working with Teradata engineers, consulting teams and technology partners.
Prior to joining Teradata, he led Electronic Arts’ Data Platform organization. Chad has held a variety of other leadership roles centered around data and analytics while at Dell and FedEx.
Chad holds a BA in economics from The University of Texas, an MBA from Texas Tech University, and performed post graduate work at The University of Texas.
Professional awards include Best Practice Award for Driving Business Results in Data Warehousing from The Data Warehouse Institute and the Marketing Excellence Award from the Direct Marketing Association. He is a regular speaker at conferences, including O’Reilly’s AI Conference, Strata, DataWorks, and Analytics Universe. Chad is the coauthor of the book Achieving Real Business Outcomes From Artificial Intelligence published by O'Reilly Media, and a frequent contributor to publications such as Forbes, CIO Magazine, and Datanami.