Date: Sat, 26 Nov 2011 09:27:32 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: mdfourmmx
On Sat, Nov 26, 2011 at 02:17:47AM +0100, magnum wrote:
> I'm taking this to the list instead just in case anyone else have an
> opinion. This started out with the question "why was mdfourmmx never
> added to JtR?"
>
> Simon answered:
> >> The best way to do it is probably with intrinsics now. It never was
> >> incorporated because solar viewed my code with a mixed feeling of
> >> disdain and horror (with reasons!).
Ouch. I think this pre-dates -jumbo the way we know it now. I was
under impression that we already incorporated all of Simon's stuff into
-jumbo by now. If not yet, then we probably should - unless, of course,
it is somehow not needed or would be replaced with a better alternative
very soon.
> JimF wrote:
> I agree with Simon here. I really think we should transition away
> from the sha1-mmx.S and md5-mmx.S files if we possibly can, for all
> builds. That is as long as we can get 'fast' properly built intrisic .S
> files.
To me, providing icc-built .S files is more of a hack than providing
hand-written .S files.
> Yes, building the sse-intrisics.c with gcc (or VC) 'works', but
> comes no where near the speed of the 32bit .S files. However, with
> properly pre-compiled intrisics, it is a wash, or better. Along with
> that, the intrisics are thread safe, which would be a very hard thing to
> do with the older 32 bit asm.
It is not fundamentally hard to do thread-safe code in hand-written asm,
it is just that it might be time-consuming to retroactively change
specific pieces of existing code to be thread-safe. (The same is true
for C code as well.) I don't know whether this applies to the source
files in question or not - I haven't looked into that.
We may also want to keep in mind that thread-safe code is sometimes
slower by 10% or so, whereas for some builds thread-safety is not
needed. So it'd be nice to support both kinds of builds optimally.
That's especially true if we consider that some people parallelize their
JtR runs other than with OpenMP - such as by using -jumbo's MPI support,
by starting several John instances manually (or with custom scripts), by
using --fork when we offer that as a standard feature, or by using other
future features.
> The reason I was considering it is that it's like 3-4 extra lines of
> code in each format (bar the #ifdefs) once the .S file is there, given
> you already have SSExxxbody implemented in a format.
Sorry, I don't know what you (Jim) are referring to here (the "it" in
"I was considering it").
> NOTE, there likely are still a few formats which are faster with the
> 32bit .S files, and likely are some systems where the 32 bit .S files
> will be faster for most formats. I will see what can be done to
> minimize these performance differences in the coming weeks.
OK. It sounds like the three of you (JimF, magnum, Simon) really want
to drop these older files in favor of the intrinsics. I am really not
sure. To me, -jumbo is a mix of almost everything that's been
contributed. You want to start introducing some restrictions on what's
acceptable/desirable and what is not. If we go that route, it might
become non-obvious where to stop in dropping stuff from -jumbo. For
example, I find the recently merged support for dynamically loaded
plugins even more questionable (I think no one uses it, and I did not
even test it) - but I merged it in the spirit of having everything in
-jumbo if at all possible. Do we drop it now as unused in practice?
And we also have formats that are probably unused in practice.
Having extra stuff in -jumbo gives these things a chance to be used,
which in turn has both pros and cons.
magnum wrote:
> I have patches pending that will boost all GETPOS-type formats, but that
> boost will apply to the "legacy" mmx/sse builds too.
Sounds good to me.
> If we can get just
> a little more power out of sse-intrinsics.c itself, then we're talking.
Hmm.
Here's one more thing to consider: for the main tree, supplying
icc-generated .S files is definitely not acceptable. We're not even
sure of that for -jumbo (no one of us read/understood the icc license
yet, it seems).
Alexander