Any good thing taken to an extreme can turn bad, we've all lived through examples of this. Maybe you had one too many eggnogs at your company's holiday party. Or think about the cat lady living down the street, who has gone a bit cat-crazy and now lives with a hoard of felines.
Parallelism in the Teradata Vantage SQL Engine is a very good thing, and generally more is better. Parallel units, called AMPs, are built into the foundation of the database and both own data and process it. The parallelism is highly-efficient, predictable, and is applied automatically to all work running on the platform. One basic premise of the parallelism is that all AMPs get the same amount of data and that is achieved with our automated data distribution using a hashing algorithm.
There is an exception to this all-parallelism-all-of-the-time approach. This exception occurs when a single row is requested from the database. Under specific conditions, the optimizer can choose to ignore the vast power of the configured parallelism and instead produce a query plan that only reaches out to a single AMP.
Two completely opposite means of pulling data out of the database are available to the SQL Engine optimizer: All-AMP access or one-AMP access. So then what's the big deal about too much parallelism?
Consider a situation where a very small table doesn't have enough rows to spread across all AMPs. Envision a Promotion table that has only 10 rows when your system has 100 AMPs. 90% of your AMPs will have no Promotion rows to process, yet those empty AMPs will have to be engaged and will have to attempt to find a row when the table is scanned. The full parallelism of the platform is brought to bear, often unnecessarily so.
Small savings do add up, and small savings are equally relevant today, whether in your home or in a large data warehouse.
A new feature introduced in 2018, Teradata MAPS, offers a novel solution for storing and accessing tables that don't need all the platform's parallel horsepower. The MAPS feature can remove over-parallelization when scanning small tables by creating "sparse maps" composed of just a single or a few AMPs. Once you move a very small table to the sparse map you've cut out a lot of unneeded parallel processing whenever you read that table. By doing so, you've freed up system resources for other queries to use.
How does this work?
In the SQL Engine, a "hash map" describes the range of AMPs available for supporting and operating on a table's rows. All tables must belong to a hash map. By default, this primary hash map includes all the AMPs in the configuration and all tables belong to it. In our example above, we’d have the Promotion table distributed across the 100 AMPs, with the majority of AMPs not receiving a row.
The table's hash map not only specifies a range of AMPs, it also controls which AMP in that range a given row will belong to. When a row is inserted into the database, the hash map that the table belongs to directs the new row to one of its AMPs, based on a hashing algorithm.
By using sparse maps, Teradata customers can now distribute relatively small tables, that formerly were spread across all AMPs, to a smaller number of AMPs. The magic part is that after doing so, queries scanning those tables will get only the level of parallelism they actually require, and users, utilities, or administrators do not need to know anything about this change to properly interact with the system.
This means that our 10-row Promotion table can be moved into a single-AMP sparse map that is provided automatically. All 10 of the rows will then be located within a single data block on one AMP, and scanning that table is now a single-AMP, single I/O operation. This small efficiency can be a benefit for queries that scan the small table repetitively, but also for queries that join the small table to other larger tables.