memcpy_amd( ) How to use it

Anyone heard of this? I have an Amd Athlon 2400+ and I guess the memcpy function is slow with these computer. I found a function that optimizes the code somehow. Here it is but I'm not sure how to input it into my program. I tried adding the file into the project and the using the memcpy_amd( ) function like the memcpy one but it said it was an "undeclared identifier" . Then I tried copying and pasting the whole .cpp file into my c++ source file and that gave errors with the assembly coding itself. The Readme file that came with it didn't say anything about how to use it. Here is the code: Can anyone help? Supposedly it produces significant gains.

[code]// Very optimized memcpy() routine for all AMD Athlon and Duron family.// This code uses any of FOUR different basic copy methods, depending// on the transfer size.// NOTE: Since this code uses MOVNTQ (also known as "Non-Temporal MOV" or// "Streaming Store"), and also uses the software prefetchnta instructions,// be sure you're running on Athlon/Duron or other recent CPU before calling!

#define TINY_BLOCK_COPY 64 // upper limit for movsd type copy// The smallest copy uses the X86 "movsd" instruction, in an optimized// form which is an "unrolled loop".

#define IN_CACHE_COPY 64 * 1024 // upper limit for movq/movq copy w/SW prefetch// Next is a copy that uses the MMX registers to copy 8 bytes at a time,// also using the "unrolled loop" optimization. This code uses// the software prefetch instruction to get the data into the cache.

#define UNCACHED_COPY 197 * 1024 // upper limit for movq/movntq w/SW prefetch// For larger blocks, which will spill beyond the cache, it's faster to// use the Streaming Store instruction MOVNTQ. This write instruction// bypasses the cache and writes straight to main memory. This code also// uses the software prefetch instruction to pre-read the data.// USE 64 * 1024 FOR THIS VALUE IF YOU'RE ALWAYS FILLING A "CLEAN CACHE"

#define BLOCK_PREFETCH_COPY infinity // no limit for movq/movntq w/block prefetch #define CACHEBLOCK 80h // number of 64-byte blocks (cache lines) for block prefetch// For the largest size blocks, a special technique called Block Prefetch// can be used to accelerate the read operations. Block Prefetch reads// one address per cache line, for a series of cache lines, in a short loop.// This is faster than using software prefetch. The technique is great for// getting maximum read bandwidth, especially in DDR memory systems.

// This is small block copy that uses the MMX registers to copy 8 bytes// at a time. It uses the "unrolled loop" optimization, and also uses// the software prefetch instruction to get the data into the cache.align 16$memcpy_ic_1: ; 64-byte block copies, in-cache copy

// For the largest size blocks, a special technique called Block Prefetch// can be used to accelerate the read operations. Block Prefetch reads// one address per cache line, for a series of cache lines, in a short loop.// This is faster than using software prefetch, in this case.// The technique is great for getting maximum read bandwidth,// especially in DDR memory systems.$memcpy_bp_1: ; large blocks, block prefetch copy

Comments

: : Basically, what do I need to include in my files. Please be specific because I don't know very much c++ .:

compils with Microsoft VC++ 6.0 compiler ok. If you use a different compiler you might have to change _asm to something else. See your compiler's documentation about how to use embedded assembly language.

: : : : Basically, what do I need to include in my files. Please be specific because I don't know very much c++ .: : : : compils with Microsoft VC++ 6.0 compiler ok. If you use a different compiler you might have to change _asm to something else. See your compiler's documentation about how to use embedded assembly language.: It compiles with Microsoft VC++ 6.0 ? That is what I'm using. What includes do you use ? Do you just add it to your project and compile the whole project ? When I compiled it, it had a problem with the [ ] in the .cpp file. I give up. How much more speed do you think it will give me? I use memcpy about 4 times and everyone one of those 4 times, it copies a 400 element array. Why couldn't they make it a header file or something ? I can use those.

Have you changed the code in your original post? Not it won't compile for me becuase of many unknown assembly instructions such as "movntq" and "prefetchnta". These instructions don't exist in 80x88 family of micro processors.

I don't think its work the effort anyway. Microsoft's implementation of memcpy(), strcopy() and a whole lot of other functions are already in assembly language.

Have you installed the processor pack with that 6.0 version of MSVC++ . I don't think I have. If I do have it, how do I know . Maybe that is why it won't compile. I've got an AMD Athlon 2400+ so it shouldn't be that.

: Stober, : : Have you installed the processor pack with that 6.0 version of MSVC++ . I don't think I have. If I do have it, how do I know . Maybe that is why it won't compile. I've got an AMD Athlon 2400+ so it shouldn't be that.:

I have VC++ 6.0 Pro edition. I don't know anything about a processor pack. I have Service Pack #5 installed. But I thought in-line assembly was a standard feature of the compiler.