Building Data Factories to Create Thousands of Data Products

The pressure to integrate analytics & machine learning into the automotive business is unrelenting. Find out what the auto industry needs to deliver on its digital promise.

Chris Hillman
Chris Hillman
4 août 2021 4 min de lecture
Building Data Factories for the Auto Industry

As the automotive sector transforms, software is becoming as important as the electro-mechanical elements of the product. Similarly, digitising business processes and deploying Industry 4.0 Technologies across the supply chain means that data products are becoming as important as software packages in transforming the business. The pressure to integrate data products such as analytics and machine learning into automotive businesses is unrelenting as the sector strives for agility and innovation to keep up with the pace of change. But to deliver the digital threads that we’ve explored in these blogs, analytics must be industrialised into efficient and effective ‘data factories’ that can create thousands of data products.

Data factories can be conceived as an efficient and reliable processes to harness business value from source data. This implies not just the ability to gather data for isolated insights, but to build data products consistently and efficiently. These are then used at all levels of the organisation, from a call center or machine operator to the C-suite and are relied upon to inform decision making on a daily basis.

Enter the Enterprise Feature Store

Repeatable, streamlined processes and reuse of common features lies at the heart of the data factory concept. Once data scientists can access data from across the entire business, the next step is to ensure they can use that data efficiently to solve real problems at enterprise scale. Individual digital threads can add value within departments or functions, but the real multiplier effect comes as data that forms individual threads are re-used, combined and extended. An enterprise feature store is the foundation for this re-use.

Enterprise feature stores are created to solve a fundamental challenge in deploying data products at scale: 80% of the data product work is spent on data wrangling: finding, cleansing and integrating data to form an analytical data set. With only 20% of their time remaining for actually creating and deploying data products, data scientists and analysts are wasting their talent.

The essence of an enterprise feature store is curated, managed repository of pre-prepared data sets with proven utility. These ‘features’ are created by data scientists as they build and test predictive analytics models but are often discarded or forgotten as they move onto the next model. The feature store saves and catalogues transformed data that has proven useful so that it can be reused in subsequent projects and by other data scientists across the organisation. It drives re-use, repeatability and efficiency instead of creating every new model from scratch.

Building Step by Step

An enterprise feature store is best created step-by-step, model by model. Imagine that as a data scientist you are tasked with creating a predictive model to understand root causes of quality issues in one aspect of the manufacturing plant. You might want to take data from a certain machine, perhaps temperature, vibration, cycles; and combine it with supply chain data on the raw materials coming into the process, and perhaps HR data on shift patterns as well as perhaps environmental data on time of day, temperature etc. Each of these data sets will need to be curated so that both the meaning and context of the data is clear. With this clarity, the data can be combined in reliable, auditable and meaningful ways.

The enterprise feature store catalogues all this hard work and makes it available to others. So, the next time a data scientist needs to use data from that machine they will already have a template. They may create new features to predict other outcomes – and these then are also stored. In this way each project contributes to a comprehensive library of high-quality data and analytical models. Even projects which do not make it to production can contribute in the form of cleansed data sets or documented models.

Test and verify

The efficiencies created by this approach are clear. But as increasingly complex and valuable models connect data from multiple sources, these saved transformations are also critical for reliability, safety and governance.

Data scientists creating models cannot be subject matter experts in every area of the business. Well- constructed, tested and proven features can democratise access to specialist data sets whilst creating standard meta-data that ensures it is integrated and used correctly. A data scientist can build a model with data from a temperature sensor knowing the units and the sampling cycles are ‘baked-in’ into the feature they start with.

For sensitive data subject to data privacy regulation, an enterprise feature store significantly contributes to the data governance responsibility. With the correct scaling ability, fewer copies of data are saved, requiring less oversight and active management. Well documented analytic models also help answer the question of how and where data is being used – and who has access to it.

Testing and verification of data products are also improved. Data science models can be applied the same level of rigour as software and hardware created for vehicles by engineers. They can also be improved an expanded upon by successive cycles of development allowing for faster, better and more efficient integration of new models into production.

The Data Factory

The automotive industry has evolved a complex but well-oiled ecosystem to deliver beautiful, efficient and safe products to its consumers. Now it needs to efficiently deliver data products that are insightful, safe and adaptable to their data-driven business processes. It requires investment in robust, resilient but agile ‘data-factories’, where data scientists can focus on delivering value rather than constantly reinventing features for ever new model.

Just as with previous evolutions, this is will not be a rapid change. As mentioned in the previous blog in this series, building the required data factory foundation will take time. However, the effort is iterative, with each value-delivering project both leveraging and expanding the existing data foundation and feature stores.

The enterprise feature store is at the heart of the data product factory. Creating, and curating, this feature store provides data scientists with the resources, tools, and space to quickly accelerate the use of analytics across automotive businesses whilst improving efficiency, agility and auditability in the production of data products. This in turn provides the agility to speed innovation in the rapidly evolving automotive and mobility ecosystems.


À propos de Chris Hillman

Chris Hillman is the Senior Director, AI/ML in the International region and has been responsible for developing and articulating the Teradata Analytics 1-2-3 strategy and supporting the direction and development of ClearScape Analytics. Prior to this current role, Chris led the International Data Science Practice and has worked on a large number of AI projects in the International Region focusing on the generation of measurable ROI from Analytics in production at scale using Teradata, open source and other vendor technologies. Chris has spoken regularly at leading conferences including Strata, Gartner Analytics, O’Reilly AI and Hadoop World. Chris also worked to establish the Art of Analytics practice, promoting the value of producing striking visualisations that draw people into Data Science projects, while retaining a solid business-outcome foundation.

Voir tous les articles par Chris Hillman

Restez au courant

Abonnez-vous au blog de Teradata pour recevoir des informations hebdomadaires

J'accepte que Teradata Corporation, hébergeur de ce site, m'envoie occasionnellement des communications marketing Teradata par e-mail sur lesquelles figurent des informations relatives à ses produits, des analyses de données et des invitations à des événements et webinaires. J'ai pris connaissance du fait que je peux me désabonner à tout moment en suivant le lien de désabonnement présent au bas des e-mails que je reçois.

Votre confidentialité est importante. Vos informations personnelles seront collectées, stockées et traitées conformément à la politique de confidentialité globale de Teradata.