Article

What Working “at Scale” Really Means

Rob Armstrong discusses the challenges of moving from a departmental solution to operational and production systems working at scale, and how Teradata Vantage can solve for them.

Rob Armstrong
Rob Armstrong
25 juin 2019 3 min de lecture
Vantage enables data analytics at scale.

Solving the ten hundred thousand problem

Things seem to come in waves, and in the past few weeks I have had the same conversation at least seven times. I call it solving the ten hundred thousand problem (and that is not the same as a million). What this means is that if you have 10 users, running 10 queries, involving 10 rows or tables it is not that hard to manage. Now increase the scale to hundreds instead of tens and you may get by with a squad of good DBAs and sysadmins. Keep going to thousands (and even millions!!) and it goes beyond the ability of people to manage the environment.

When you need to move from the departmental solution or POC to “operational and production systems working at scale” you encounter three main challenges: optimizing the individual queries, managing the diverse workload, and overall system monitoring. Let’s take a look at each of these in a bit of detail.

Vantage is built upon the Teradata database, which has the richest optimizer for analytic workloads. It takes that foundation and has expanded it to include optimization with machine learning & graph functions as well as incorporating data from other systems.

Optimizer

The first challenge is that each individual query has to be effectively optimized every time it runs. That means that you cannot rely on people to provide hints or expect that users will write the best code or think that a business tool that generates generic SQL is best suited for every platform. The optimizer must be fully aware of data relationships, fully leverage parallelism, and be able to execute snippets of a query that in turn can be used to better optimize the next steps in that query.

Teradata Vantage is built upon the Teradata database, which has the richest optimizer for analytic workloads. Vantage takes that foundation and has expanded it to include optimization with machine learning and graph functions as well as incorporating data from other systems.

Workload Management

Once each query has been optimized, the workload as a whole needs to be managed. Not every query carries the same importance. There are web access queries that have stringent service levels that must be met, or customer experience suffers. The other end of the spectrum are long running queries of a “what if” nature where the run time is not as much of a concern, as long as insight can be gained from the completed query.

Of course, the mix along this spectrum is constantly changing. Some hours are heavily skewed to the tactical while other times are skewed to the strategic. As concurrency grows, overlapping priorities and conflicting requirements will become a problem to manage. 

Teradata Vantage has world class workload management whereby rules and service levels are broadly defined, and the system takes over from there to ensure all workloads are receiving the proper resources to satisfy the user expectations.

System Monitoring

While it is good to have queries optimized and workloads managed, there is the broader challenge of keeping the whole environment running smoothly as a single system. This would include message traffic, monitoring, error recovery, space management, as well as continuity in systems where loads and queries are happening all day. To add to the challenge is a concept like failover where multiple systems are being kept in synch for highly operational environments.

No small task indeed and this is where Teradata Vantage again leverages the rich experience of the past. Excelling not only in single system management but bringing a full suite of tools within IntelliSphere to simplify and automate multi-system ecosystems.

Conclusion

There is a big difference between running a POC and running “at scale.” There is also a difference between a departmental or point solution and running across your enterprise with consistent and integrated data. The worst time to understand the limitations of a solution is once value is shown and you need to grow by adding more users, more data, and more queries.

Teradata Vantage brings the power of all the above in addition to unparalleled scalability. And as new advanced analytics engines are added into the mix, Teradata will be working to bring the same rigor to those operations as well.

All of this combines to solve the ”ten hundred thousand” problem, allowing your business to go from insight to production without worry and to drive millions to your bottom line.

Tags

À propos de Rob Armstrong

Since 1987, Rob has contributed to virtually every aspect of the data warehouse and analytical arenas. Rob’s work has been dedicated to helping companies become data-driven and turn business insights into action. Currently, Rob works to help companies not only create the foundation but also incorporate the principles of a modern data architecture into their overall analytical processes.

Voir tous les articles par Rob Armstrong

Restez au courant

Abonnez-vous au blog de Teradata pour recevoir des informations hebdomadaires



J'accepte que Teradata Corporation, hébergeur de ce site, m'envoie occasionnellement des communications marketing Teradata par e-mail sur lesquelles figurent des informations relatives à ses produits, des analyses de données et des invitations à des événements et webinaires. J'ai pris connaissance du fait que je peux me désabonner à tout moment en suivant le lien de désabonnement présent au bas des e-mails que je reçois.

Votre confidentialité est importante. Vos informations personnelles seront collectées, stockées et traitées conformément à la politique de confidentialité globale de Teradata.