[haiku-development] Re: Optimizing Painter::_DrawBitmapBilinearCopy32

From: Stephan Assmus <superstippi@xxxxxx>

To: haiku-development@xxxxxxxxxxxxx

Date: Tue, 16 Jun 2009 15:16:49 +0200

On 2009-06-16 at 15:09:06 [+0200], André Braga <meianoite@xxxxxxxxx> wrote:
> Em 16/06/2009, às 09:14, Christian Packmann
> <Christian.Packmann@xxxxxx> escreveu:
> > The SSE2/SSSE3 routines are also improved. Of the unrolled versions
> > only the SSSE3 variant is finished, the MMX and SSE2 variants need more
> > work. I'm sceptical that they will yield much improvement, anyway; the
> > unrolled SSSE3 only gives 14% more performance than the unrolled
> > version, I don't think improvements will be much greater for MMX/SSE2,
> > but maybe some CPUs will perform well on them.
>
> Just for kicks, could you compile a static .o for AMD64 that we could
> then link to produce an executable for a 64-bit OS of choice? I'd like to
> see what GCC4.2+ manage to do to your code with extra registers,
> optimization levels and autovectorization switches.
>
> Also, I see that you have SSSE3 versions for the routines, but why not
> SSE3 "plain" with 33.33% less S? :)
>
> No useful added functionality in those 13 extra instructions compared to
> what you're already doing in SSE2?
Hm, it appears the optimized code is already about twice as fast as the
plain C code for pretty much every architecture that was benchmarked. So
what I would love to see is a patch against app_server which integrates
this code so I can watch movies fullscreen with smooth scaling. If the code
can later be made even faster, nice, but it's darn useful already. Or would
be if there were a patch. ;-)
Best regards,
-Stephan