Many MPP data warehousing vendors have told me their products are used for ELT (Extract/Load/Transform) instead of ETL (Extract/Transform/Load). I.e., needed data transformations are done on the MPP system, rather than on the — probably SMP — system the data comes from.* If the data transformation is being applied on a record-by-record basis, then it’s automatically fully parallelized. Even if the transforms are more complex, considerable parallel processing may still be going on.

*Or it’s some of each, at which point it’s called ETLT — I bet you can work out what that stands for.

But depending on your needs, at least two other approaches to data transformation parallelization could also be considered. Pervasive Software, which has a big data integration software business of its own, built a new ETL tool. The foundation was a middle-tier multi-core-friendly Java dataflow engine, which has been now split out as Pervasive Datarush. The product is in the early stages of being released, which may be a good excuse for the website confusingly suggesting both of:

You can have Datarush for free.

If Datarush doesn’t produce a 30X speedup for you, you can get your money back.

Large-scale transformations can be parameterized as SQL/MR functions for data cleansing and standardization, unleashing the true potential for Extract-Load-Transform pipelines and making large-scale data model normalization feasible. Push down also enables rapid discovery and data pre-processing to create analytical data sets used for advanced analytics such as SAS and SPSS.

Curt, on this topic, I would like to point you to Talend, the first open source data integration software. Talend Open Studio is also the first solution to support both the ETL and ELT approaches natively – and of course the ETLT approach as well.
Unlike tools like Sunopsis (now Oracle Data Integrator), arguably the pionner of ELT, and engine-based tools such as Informatica or DataStage that support only ETL (ELT is only an afterthought), Talend supports both approaches natively, providing always the best performance.
More info on Talend Open Studio: http://www.talend.com/products-data-integration/talend-open-studio.php