implementation of pmadd in AVX architecture forces generating vfmadd231ps in Clang with assmebly, this may not be profitable anymore since Clang no longer always "generates vfmadd213ps instruction plus some vmovaps on registers" like it says in the implementation comment.
commuting is done for memory and register operands and the correct fmadd permutation is chosen allowing optimizations such as Memory Folding.
so forcing assembly code might result in skipping optimization opportunities