It is better to take another step back and check what is the none-parallel plan for joining the three tables only (PRODUCT_H, ADVT_WRK_TBL, HH_COUNT_FACT).

Since the result is group by AL2.DM_PRODUCT_HKEY, AL3.PRODUCT_YEAR it naturally makes sense to initiate the parallel process on PRODUCT_H table particularly partitioned on DM_PRODUCT_HKEY and sub-partitioned on PRODUCT_YEAR so each slave process can easily return its own grouping result. Though other business process may prevent from doing such design.

What else? Perhaps ensure enough memory for possible hash join in each slave process and analyze sql trace for major wait events.

Based on the plan provided the parallel process is doing extraordinary and (probably) extra things. But each slave process has to do basic sql joins which looks getting lost (and/or beyond ability to read). So a none-parallel plan at least shows how optimizer determines the join order and the join method, which may be also true to each slave process in parallel process or provide good reference.

If insist to initiate the parallel process on ADVT_WRK_TBL(AL2), you can try to partition the table on column AL2.DM_PRODUCT_HKEY and see how the plan responds.

Consider all the table volumes, the none-parallel process should not be very bad unless the cardinality is extraordinary or some resource contention exists. But it is all up to your expectation and requirement.