[phpBB Debug] PHP Warning: in file [ROOT]/phpbb/session.php on line 580: sizeof(): Parameter must be an array or an object that implements Countable[phpBB Debug] PHP Warning: in file [ROOT]/phpbb/session.php on line 636: sizeof(): Parameter must be an array or an object that implements CountableRAR archive encription - 3.14.by forum

Is there anyone know good info about rar decripting? (something more than "it is just AES")
I.e. how the password is being checked without knowing plaintext, and how decription should look like in general?
Is there any "dumb" source available?

I wrote a RAR cracker a while ago. I had to study the source code of the Unix version of unrar, because at the time, the format of encrypted archives wasn't documented (not sure if it is today). You should know there are 2 different ways to encrypt a RAR archive; the rar CLI tool exposes them through 2 options:

-p option, which encrypts only the content of the files in the archive, while file metadata (filenames...) are not encrypted (which is stupid, an encrypted archive should not leak its file list)

I have never really studied -p encryption, the cracker I wrote was for archives encrypted with -hp, but the 2 encryption mechanisms are probably based on the same concepts. An archive consists of a "marker block", an "archive block", one or more "file blocks" (1 per file), and what I call an "end-of-archive block" (this one was undocumented at the time). When encrypting an archive with -hp: 1 random 64-bit salt is generated per file block and per end-of-archive block, the UCS-2 encoded password is concatenated to the salt, the passwd-salt pair is stretched by 262144 rounds of a function based on SHA-1 (see CryptData::SetCryptKeys in unrar's source code), which eventually outputs a 128-bit IV and 128-bit key used to AES-encrypt the file block or end-of-archive block. The 64-bit salt followed by the encrypted block is then stored in the RAR archive. My cracker tested passwords by decrypting the end-of-archive block because it seemed to be a constant 7-byte blob: c4 3d 7b 00 40 07 00. But this is undocumented so I am not sure if it is true for all files.

It goes without saying that bruteforcing a -hp RAR archive is pretty slow because of the 256k rounds. And the whole mechanism seems at first sight pretty secure, unless a flaw is found in the stretching function...

I could dig up my tool if you are interested. It was really just a hundred lines or so built around CryptData::SetCryptKeys...

Yeah, that would be very userful. Now I am less scared with it, constant eof is very nice
256k rounds... That is where we would need GPU, even 50k keys/sec would be a nice improvement over current 2-3k keys/sec.

Here you go. Note that actually the right way to check the validity of the password is try to decrypt the first few bytes of one of the file blocks: the plaintext datastructure is supposed to be a header whose structure is well documented (http://fts.ifac.cnr.it/cgi-bin/dwww/usr ... ote.txt.gz) and which contains a CRC. If the CRC is valid, you can assume the password was correct. I prefer my end-of-archive method though, it's less code

I just noticed that rar (at least version 3.71) uses the same 64-bit salt for all blocks (ie. all files) and also uses AES in ECB mode ! This is a typical amateur mistake that makes possible a number of theoretical attacks (but none, I think, that could help us improve the bruteforcing speed). I don't find this surprising though. Tools where encryption is an afterthought always get it wrong. Sigh.

ash wrote:why don't you want to compile al the stuff with calclCompile() for different cores? why r700, only?

There are two reasons:
1. Kernel compilation takes A LOT of time and (more important) memory. And there is kernel for each possible length == 16 kernels to compile. I'm working on 8Gb system and this amount of RAM simply not enough to compile everything. It takes 1+ hour, Windows starts to swap everything, memory usage peaks at 16Gb for kernel with length == 18... Shortly, it's just terrible, better to precompile them.
2. Cards other than HD48xx simply isn't good enough for integer math. I've tried another algo (winzip w/ aes) and got 140K on HD4850 vs only 4K on integrated HD3200. There no 35x difference in 3D between them ofc. Dunno about other cards though.

ash wrote:p.s.:watcom c++ i thought no one use it now

Yeah, I'm that old and lazy to switch to another compiler . Btw, openwatcom 1.8 was recently released .

kernel for every pass length-terrible.
execution of calclCompile()- takes 1+ hour o 8 gb mem.
for example bars via brook.dll makes compilation of shader in executable and it takes not too much...
firstly i thought you'd made precompilation because of crackers could watch yours shader, but in total that's not good-intresting to test it on all devices(i guess).
2.clause due to sp count difference.
p.s.: to hide shaders i'd precompiled my shaders and after making Disassembling and after that compilation with targer supported devise ISA value, but it doesn't takes so much time.
can you give some links on decryption rar algo, plz

ash wrote:kernel for every pass length-terrible.
execution of calclCompile()- takes 1+ hour o 8 gb mem.
for example bars via brook.dll makes compilation of shader in executable and it takes not too much...
firstly i thought you'd made precompilation because of crackers could watch yours shader, but in total that's not good-intresting to test it on all devices(i guess).
2.clause due to sp count difference.
p.s.: to hide shaders i'd precompiled my shaders and after making Disassembling and after that compilation with targer supported devise ISA value, but it doesn't takes so much time.
can you give some links on decryption rar algo, plz

In BarsWF it also takes long in my measures - I was asking at AMD forum why It takes ~1-2 seconds to execute first call I would prefer more like 0.1 sec

If you want to get good speed -- it's the only way possible. Pavel's crark's performance suffer because he doesn't pay attention to this thing . My implementation is very close to theoretical limits of GPU.

Because compiling one MD5_Transform isn't complex at all . My kernel for winzip w/ AES (which in fact required just 2 SHA1_Transform 1000 times) also compiles very fast, just milliseconds. But RAR's "password to key" routine is very GPU unfriendly.

ash wrote:firstly i thought you'd made precompilation because of crackers could watch yours shader, but in total that's not good-intresting to test it on all devices(i guess).

As I already wrote -- only HD48xx will show good results, so no real point to test on anything else.

ash wrote:can you give some links on decryption rar algo, plz

RAR sources are enough . SetCryptKeys implemented on GPU, AES init & decoding on CPU as it doesn't takes too long. Final validation done after decrypting 32 bytes of RAR header using some heuristics (like header type can only be 0x74 or 0x7A, version to extact >= 29, etc).

In BarsWF it also takes long in my measures - I was asking at AMD forum why It takes ~1-2 seconds to execute first call I would prefer more like 0.1 sec

Actually, compilation of md5 takes~ 50 ms. 1-2 sec on brook- due to OOP realization of last and compilation, linking, ...
i thought it's well known stuff

If you want to get good speed -- it's the only way possible. Pavel's crark's performance suffer because he doesn't pay attention to this thing . My implementation is very close to theoretical limits of GPU.

i know, actually, i edit kernel in runtime and then compile, and working with new one: for example-multihash is not well implemented on constant kernel- it needs resort and recompilation of new kernel;

Because compiling one MD5_Transform isn't complex at all . My kernel for winzip w/ AES (which in fact required just 2 SHA1_Transform 1000 times) also compiles very fast, just milliseconds. But RAR's "password to key" routine is very GPU unfriendly.

calclCompile() just assembles shaders to obj- interesting why it takes so much time to put CALL/RET in code

As I already wrote -- only HD48xx will show good results, so no real point to test on anything else.

not all got r700-like devices

RAR sources are enough . SetCryptKeys implemented on GPU, AES init & decoding on CPU as it doesn't takes too long. Final validation done after decrypting 32 bytes of RAR header using some heuristics (like header type can only be 0x74 or 0x7A, version to extact >= 29, etc).

ash wrote:calclCompile() just assembles shaders to obj- interesting why it takes so much time to put CALL/RET in code

calclCompile() not "just assembles" it, in fact, trying to agressively optimise everything. Check out code produces by brook+ -- it simply terrible. If you'll write a = b + 5 + 5; it'll be compiled into +5 +5 not +10. All optimisation done in calclCompile() call.
I guess for lots of "call/rets" compiler trying to do something like "whole program optimisation" which takes a lot of time for complex kernels.

ash wrote:not all got r700-like devices

Time to get one of them . I really don't see any point to use GPU code which will be times slower than CPU one.

tha same solodnikov,openwatcom,loadlibrary, getprocaddress- some dejavu-like stuff;but solodnikov got some more interesting stuff-asprotect;
icc -more suitable, then watcom,it like borland compilers put data and code into the same section -overplay of code cache
nvcuda?-don't like wrappers?
what cuda:ptx or kernel?
hope ptx?
do you use cpu for calculation, cauze i saw it got some long function with a lot cyclic rotations?
source:no need of souce just main proc in asm, plz

I know (as well as I knew Solodovnikov IRL ). Just FYI -- ASProtect is no more Solodovnikov's thing, StarForce purchased ASPR year+ ago.

ash wrote:icc -more suitable, then watcom,it like borland compilers put data and code into the same section -overplay of code cache

It doesn't matter at all. All "heavy" computations done in GPU. It's possible to optimize CPU routines, yes, but as they take like 1% of time it won't increase total performance a lot. And if I'll decide to include CPU support I'll use assembler anyway.

ash wrote:nvcuda?-don't like wrappers?

Don't like unneeded code which leads to compatibility problems.

ash wrote:what cuda:ptx or kernel?
hope ptx?

cubin ofc . Just no point to have uncompiled kernel.

ash wrote:do you use cpu for calculation, cauze i saw it got some long function with a lot cyclic rotations?

Final password's validation needs some SHA1 & AES and done in CPU.

ash wrote:source:no need of souce just main proc in asm, plz

Main proc written in pure C and having a lot of "comments" in form of references to strings (like "Cannot init CAL structs.", "Error <here>, error <there>"). It can be analyzed in no time .

I guess you don't have any supported videocard that's why you trying to disassemble .exe instead of simply run it .

I know (as well as I knew Solodovnikov IRL ). Just FYI -- ASProtect is no more Solodovnikov's thing, StarForce purchased ASPR year+ ago.

shit, won't be new versions

It doesn't matter at all. All "heavy" computations done in GPU. It's possible to optimize CPU routines, yes, but as they take like 1% of time it won't increase total performance a lot. And if I'll decide to include CPU support I'll use assembler anyway.

1%- i saw. Intrinsics? asm is in heavy use for crackmes or virii or some selfwritten packers, anyway that's yours decision.

Don't like unneeded code which leads to compatibility problems.

actually, cudart.dll use dynamic aka runtime linking of nvcuda.dll, so (i guess) they(nvdia-guys) going to release(change) more frequently nvcuda.dll which is
distributing with nvidia's drivers,so wrappers won't make any compatibility inconviniences.all developers play with curart.dll apis, all i know.

cubin ofc . Just no point to have uncompiled kernel.

wright choise.

Final password's validation needs some SHA1 & AES and done in CPU.

...plz answer more quicker, in 20 min. i'd known that

Main proc written in pure C and having a lot of "comments" in form of references to strings (like "Cannot init CAL structs.", "Error <here>, error <there>"). It can be analyzed in no time .

plz make cpu-support...

I guess you don't have any supported videocard that's why you trying to disassemble .exe instead of simply run it .

actually, i got 4870, but you're wright i hadn't run it -first stuff-disassembling in wmware---aka paranoia