I forget the details, but libxsmm is something that depends on an
instruction introduced with SSE3, and is a good example of portable
performance engineering over a wide range of (x86_64) processors.

According to the documentation, libxsmm actually also supports a
generic/SSE2 code path (LIBXSMM_X86_GENERIC), with runtime detection. So I
do not see a valid reason to require SSE3 in libxsmm.
Kevin Kofler