Announcing EMR Release 5.24.0: With performance improvements in Spark, new versions of Flink, Presto, and Hue, and enhanced CloudFormation support for EMR Instance Fleets

This release also includes three new performance optimizations which you can enable and improve Spark performance by up to 13X: Dynamic partition pruning, Flattening scalar subqueries, and DISTINCT before INTERSECT.

Dynamic partition pruning allows the Spark engine to dynamically infer relevant partitions at runtime, saving time and compute resources by both reading less data from storage, and processing less records.

Flatten scalar subqueries helps in situations where multiple different conditions need to be applied to rows from a specific table, and prevents the table from being read multiple times for each condition. This reduces redundant data reads and improves performance.

DISTINCT before INTERSECT eliminates duplicate values in each input collection prior to computing the intersection, improving performance by reducing the amount of data shuffled between hosts.

You need to enable these optimizations via Spark properties. Please refer to the EMR 5.24.0 release notes to learn more about these features.

Additionally, you can now use CloudFormation templates and specify multiple subnets for different Availability Zones within a VPC when you launch clusters using EMR Instance Fleets. This feature is available from EMR versions 4.8.0 and greater (with the exception of 5.0.x)