@neuron2
You can compile 64bit binaries without a 64bit operating system, you just can't test them without a 64bit computer/os The ease of porting the build will depend on how much it relies on assembly language, always considering pointers as the same length as an int, etc. If you want to try, I can find you some good reading material.

Also, isn't dg nv tools written in CUDA? I'm not super-familiar with compiling it, is it done via a custom NVIDIA compiler? If that's the case, and it only uses NVIDIA specific language references (I know CUDA is C like, or something) then it may be as easy as changing your compile target architecture to x64.

A little background info on how it's written, and I think I can get you going . . .

@Audionut
That error (the MT plugin is written in russian originally, I think?) usually happens when something goes wrong in the script you're trying to invoke. Syntax error, filter not loaded, something little is missing. That's really the only place in the code I see that error, so I don't know what to tell you. Something's throwing an exception in your chain.

@osgZach
Thread deadlock should be resolved by tonight. I'm pretty sure it's a result of me giving the compiler free reign to add parallelism where it *thinks* it is safe. I think I've mentioned this in an earlier post, compilers are stupid, I shouldn't have trusted it, but oh well. I can thread it all by hand eventually, but that's not my main concern. I'm working on some performance tweaks here and there that should make everyone smile.

Also, just wanted to say, I know we all have problems with the build but don't let our whining control when you release updates or how often you work on it. I know how easy it is to get burned out on something you start out doing because you want to. I'd rather download an update a week later than a hastily put together crowd pleaser

In other news.. I've gone back to using MT("filter") as I appear to have solved the overlap issue by cranking it up to 12.. Which I was initially afraid to do because I thought it would actually overlap the sections to the point where actual data started dissapearing.. It seems to work though, I will be comparing it against other encodes made without MT("filter")..

It is interesting to note, that I can actually use my PC for web browsing and other tasks when using MT("filter") - as opposed to SetMTMode which makes the entire thing slow down to a bog crawl. Even though performance is roughly the same (although MT("filter") seems a bit faster ).. Any reason why that is? And would it be reasonable to expert SetMTMode in the future to not bog down the PC completely even when using all cores?

Levi, I will if I have problems in the future. But for now I'm content that its working the way I have it.

The only time SetMtMode didn't bog down the PC was when trying out the x32 builds w/MT. And performance was not really better than single threaded either for the most part, although I saw occassional spikes past 50 under task manager.. Maybe I just have a goofy install of Windows going on, or something like that..

But if its working under MT("filter") then that is OK for now.

re: DGindex/NV, The indexer itself runs fine as a 32bit app, its the decoder DLL's that have to be 64bit, and they will have no problems loading the index file as its just a basic text file after all.. Squid's x64 release is just the Dgdecode.dll compiled for x64.

It doesn't look like your initial SetMTMode call specified the number of threads, try (5,0). You might even be able to get away with Mode 2.. There seems to be conflicting opinions about whether it works properly under any mode less than 5.

Is your input clip progressive? You shouldn't be resizing before a deinterlace also, and I don't know if trying to resize an interlaced clip would mess up MT either ?

I think your resize problem is really my fault . . . I noticed I rolled back to an older version of the resample function source when releasing that build. If you could try my latest compile, I'd appreciate it. Let me know if it's still crashing so I know if I need to delve deeper into the matter.

@neuron2This details the differences between x86 and x64. The main thing to watch out for is any assembler that assumes a certain calling convention. In x64, instead of parameter passing on the stack, you get the first four "integer" (pointers, shorts, whatever) in rcx, rdx, r8, and r9. The first four floating point typed arguments are passed in xmm0-xmm3 (I think). There is shadow space created on the stack where you would normally find these variables, but don't go looking there for them. It's just garbage memory to start with, unless you explicitly store the parameter there for later use. Also, all variables on the stack are aligned to 64bits. So, even though int types should take 4 bytes on your stack, 8 are actually allocated. However, when reading these from memory, you can't read all 8 bytes, only the first 4 contain useful memory.

Because bytes 44-47 are garbage. It's the little things you have to get used to.

There are some other oddities, like whenever an extended register is used, a REX prefix accompanies the opcode. If you want to use an extended register (r8-r15) as a counter, dec and inc always treat them as 64bit values, even when specifying:

Code:

dec r8d
or
inc r9d

You still get a 64 bit add, which is slower.

Volatile registers:
rax
rcx
rdx
r8
r9
r10
r11

Non volatiles:
rbx
rbp
rdi
rsi
r12
r13
r14
r15

XMM0-XMM5 are volatile, XMM6-XMM15 are non-volatile.

If your compiler supports 64bit binary compilation, then your shouldn't have a problem just taking the source and compiling it as is.

Watch out for MSVS's compiler, it takes away inline asm when compiling for x64.

My bad on the inc, dec instructions, I thought I had read something in the Intel docs about them, it was actually that some forms of them aren't supported. I think the instruction is encoding may longer when using the extended registers, regardless of which portion of the register you want to inc or dec.

Quote:

The INC and DEC instructions are supported in 64-bit mode. However, some forms of INC and DEC (the register operand being encoded using register extension field in the MOD R/M byte) are not encodable in 64-bit mode because the opcodes are treated as REX prefixes.

There's also something about register dependency breaking (the whole register renaming in an out of order core thing) if you modify all the flags with an ADD or SUB rather than just an INC or DEC, I don't remember exactly what it is.
Edit: found it, from intel's optimization guidelines

Quote:

Assembly/Compiler Coding Rule 32. (M impact, H generality) INC and DEC instructions should be replaced with ADD or SUB instructions, because ADD and SUB overwrite all flags, whereas INC and DEC do not, therefore creating false dependencies on earlier instructions that set the flags.

I'll look into why the resize is throwing an error. What color space is it giving the error in? I can't seem to get the crash in any color space . . .

Does the amd build work on intel cpus? If so, is there a significant speedup of intel over the amd build on intel cpus? If not, any reason for separate builds? If so, do you want me to improve the install script to use amd dll for amd cpus and intel dll for intel cpus? If so, what will be the names of the 2 dll's?

Also I could make a self contained exe to install or use a typical installer but I'd have to inject new dll's every time, also typical installers are invasive in the registry. Maybe when the development slows down i'll look into it if you want.