The Windows Script Encoder (screnc.exe) is a Microsoft tool that can be used
to encode your scripts (i.e. JScript, ASP pages, VBScript). Yes: encode, not
encrypt. The use of this tool is to be able to prevent people from looking at,
or modifying, your scripts. Microsoft recommends using the Script Encoder to obfuscate your ASP pages, so in case your server
is compromised the hacker would be unable to find out how your ASP applications
work.

Note that this encoding only prevents casual viewing of your code; it will
not prevent the determined hacker from seeing what you've done and how.

(By the way, because of this text, I did not deem it necessary to inform Microsoft
of this article).

Also, an encoded script is protected against tampering and modifications:

After encoding, if you change even one character in the encoded text, the
integrity of the entire script is lost and it can no longer be used.

So we can make the following observations:

If its ICT about "preventing casual viewing", what's wrong with encoding mechanisms
like a simple XOR or even uuencode, base64, and URL-encoding?

Anyone using this tool will be convinced that it's safe to hard-code all
usernames, passwords, and "secret" algorithms into their ASP-pages. And any
"determined hacker" will be able to get to them anyway.

I think this can be a very nice exercise for anyone who wants to learn more
about analysing codes like this, with known plaintext, known cypertext, and unknown
key and algorithm. (Actually, a COM object that can do the encoding is shipped
with IE 5.0, so reverse engineering this will reveal the algorithm, but that's
no fun, is it?)

So, how does this work?

The Script Encoder works in a very simple way. It takes two parameters: the
filename of the file containing the script, and the name of the output file,
containing the encoded script.

What part of the file will be encoded depends on the filename extension, as
well as on the presence of a so-called "encoding marker". This encoding marker
allows you to exclude part of your script from being encoded. This can be very
handy for JavaScripts, because the encoded scripts will only work on MSIE 5.0
or higher.... (of course this is not an issue for ASP and VB scripts that run
on a web server!).

Say, you've got this HTML page with a script you want to hide from prying
eyes:

As you can see, the <script language="..."&gt has been changed into "JScript.Encode".
The Script Encoder uses the Scripting.Encoder COM-object to do the actual encoding.
The decoding will be done by the script interpreter itself (so we cannot simply
call a Scripting.Decoder, because that doesn't exist).

Okay, let's play!

Plaintext

Encoded

Hoi

#@~^FQAAAA==@#@&CGb@#@&zz O@*@#@&WwIAAA==^#~@

Hai

#@~^FQAAAA==@#@&CCb@#@&zz O@*@#@&TQIAAA==^#~@

HaiHai
HaiHai

#@~^IgAAAA==@#@&CCbCmk@#@&CmrCmk@#@&JzRR@*@#@&mgUAAA==^#~@

Cute. As you can see, @#@& appears to be a newline (@# = CR, @& = LF), and
the position of a character does (sometimes...) matter (the first time HaiHai
becomes CCbCmk and the second time it's CmrCmk). Let's just encode a line with
a lot of A's:

After staring at this for some time, I discovered that the bold part was repeating
(actually, the entire string is repeating itself after 64 characters). Also,
it seems to be that the character 'A' has three different representations: b,
z, and ). If you encode a string of B's you'll see the same pattern, but with
different characters.

The string of A's had a CR and LF in front of them, so after skipping the
first two digits, you'll see that 0, 1, 2, 0, 2, 0, 0, 2 perfectly matches b,
), z, b, z, b, b, z , having b=0, )=1 and z=2.

The other table is a matrix that holds three different representations for
each character. Which one will be used, depends on the pick_encoding table.
To find out this matrix, just make a file that will cause every character to
be encoded three times. Make sure the algorithm is 'reset' by padding the lines
so each group will start on a 64-byte boundary.

So what is this? It's the encoded representation of the ASCII characters
9, and 32 through 126. Every character has got three different representations,
so this sums up to 3*(127-32 + 1) = 288 characters.

You'll see that the < , > and @ characters are escaped too, resulting
in the following table:

Esc

Org

@#

\r

@&

\n

@!

<

@*

>

@$

@

I've removed the @!, @* and @$ from the encoded text too and replaced them
with question marks, so the table will stay nice. This is what you get as a
hex dump:

look up which representation to use (the first, second or third): pick_encoding[i
mod 64]

find the representations in the huge table: encoding[c * 3]

encoded character = encoding[c*3 + pick_encoding[i%64]];

Because the table starts at 9 and then goes to 32, you'll have to do some corrections.
But we'll get to that later, as we are not really interested in encoding after
all. We want to be able to do some decoding!

The decoding tables

The pick_encoding table will stay the same. This is because each character
(except for the escaped ones, of course) will be in the same place as the original.
Then, we could just look up the encoded character in the table. For instance,
an 'A' in encoded text (hex 0x41), occurs on these places in the 'encoding'
table:

row 9, group 4, representation 1 = 'F'

row 10, group 3, representation 3 = 'I'

row 23, group 1, representation 2 = '{'

So an 'A' in the encoded text is an F, I or {, depending on it's position. Where
there is a 0 in the pick_encoding table, it's an F, for 1 it's an I, and for 2
it's a {.

You don't want to go looking through the encoding table each time, trying
to find those numbers. By transforming the encoding table into another table,
you can just go to position 0x41 (actually, 0x41 - 31 to correct it skipping
everything below space except for TAB), and pick the correct representation.

With this matrix, it's very simple to look up the original character by simply
looking it up in our table. Assume i is the position of the character and c
is the character again. Then:

decoded = transformed[pick_encoding[i%64]][c];

The encoding of the length-field

So what's left is to find out how many characters there are to decode. If
we just keep decoding stuff, we will decode part of the HTML that's behind the
encoded script. This can be avoided by stopping when a ' I created a number of files of different size. By giving them a *.js extension
the entire file is encoded without the Script Encoder looking for a start marker.
The results are below (only the first 12 bytes are displayed).

Length

First 12 bytes

ASCII

1

23 40 7E 5E 41 51 41 41-41 41 3D 3D

#@^EQAAAA==

2

23 40 7E 5E 41 67 41 41-41 41 3D 3D

#@^EgAAAA==

3

23 40 7E 5E 41 77 41 41-41 41 3D 3D

#@^EwAAAA==

4

23 40 7E 5E 42 41 41 41-41 41 3D 3D

#@^FAAAAA==

5

23 40 7E 5E 42 51 41 41-41 41 3D 3D

#@^FQAAAA==

6

23 40 7E 5E 42 67 41 41-41 41 3D 3D

#@^FgAAAA==

7

23 40 7E 5E 42 77 41 41-41 41 3D 3D

#@^FwAAAA==

8

23 40 7E 5E 43 41 41 41-41 41 3D 3D

#@^GAAAAA==

9

23 40 7E 5E 43 51 41 41-41 41 3D 3D

#@^GQAAAA==

32

23 40 7E 5E 49 41 41 41-41 41 3D 3D

#@^IAAAAA==

48

23 40 7E 5E 4D 41 41 41-41 41 3D 3D

#@^MAAAAA==

80

23 40 7E 5E 55 41 41 41-41 41 3D 3D

#@^UAAAAA==

96

23 40 7E 5E 59 41 41 41-41 41 3D 3D

#@^YAAAAA==

103

23 40 7E 5E 5A 77 41 41-41 41 3D 3D

#@^ZwAAAA==

104

23 40 7E 5E 61 41 41 41-41 41 3D 3D

#@^aAAAAA==

111

23 40 7E 5E 62 77 41 41-41 41 3D 3D

#@^bwAAAA==

116

23 40 7E 5E 64 41 41 41-41 41 3D 3D

#@^dAAAAA==

166

23 40 7E 5E 70 67 41 41-41 41 3D 3D

#@^pgAAAA==

216

23 40 7E 5E 32 41 41 41-41 41 3D 3D

#@^2AAAAA==

265

23 40 7E 5E 43 51 45 41-41 41 3D 3D

#@^CQEAAA==

451

23 40 7E 5E 77 77 45 41-41 41 3D 3D

#@^wwEAAA==

The length seems to be encoded in the 5th to 10th byte, and 41 appears to
be representing zero. The first byte of the length seems to be increasing with
one when the length increases with 4. Also, the second byte alternates between
41, 51, 67, and 77.

If you look at length 166, this value is 0x70, where it should be 0x41 + (166/4)
= 0x6a. So something goes wrong, and it can be narrowed down to length 104,
where it suddenly jumps from 0x5a to 0x61. This puzzled me for a long time,
until I realised that 0x5a = 'Z' and 0x61 = 'a'. And yes, the length turns out
to be Base64 encoded indeed :)

The checksum

At the end of the encoded data is apparently some kind of checksum. I did not
look into this any further.

The decoder program

The further working of the decoder program, which can be downloaded from the
scrdec home page,
is left as an exercise to the reader. It's implemented as a "Turing-like" state
machine. The decoder will treat .js and .vbs files as fully encoded, while .htm(l)
and .asp files are seen as files that contain script amongst other things -
like HTML code.

There is one thing lacking in the decoder: the value of the <SCRIPT LANGUAGE="..."&gt
attribute, is not changed back into the original form. You'd better use
a tool like sed for that.

Conclusion

It's not just sad that Microsoft made a tool like this. They've probably asked
Bill Gates' little nephew to write this code. The really bad part is that Microsoft
actually recommends people to use this piece of crap, and because of that, people
will rely on it, even though the documentation hints that it's unsafe. (Nobody
reads the docs anyway...)

Security by obscurity is a bad, bad idea. Instead of encouraging that approach,
Microsoft should educate programmers to find other ways to store their passwords
and sensitive data, and tell them that an algorithm or any other piece of code
that needs to be 'hidden', is just bad design.