Introduction

This article presents a JavaScript compression tool that takes your JavaScript source code and compresses it by removing all comments, extraneous whitespace, and optionally, as many line feeds as possible, and by optionally shortening function parameters and variable names. This will reduce the script size, and may help your pages load faster and reduce bandwidth consumption. A minor side benefit when line feed removal and variable name compression is enabled is that it provides lightweight obfuscation of the code, making it harder for the casual user to read and/or play around with it. It won't stop a determined user from reformatting and reverse engineering it, but that is not the intent of this tool.

I developed this tool for use in my own ASP.NET projects. The code is written in C#, but as long as you have the .NET Framework installed, it can be used to compress JavaScript for any web project, .NET or otherwise. The supplied project file is for Visual Studio 2003, but it can be opened, converted, and successfully compiled under Visual Studio 2005 as well.

There are three levels of compression:

No Line Feed Removal

Line feeds are not removed from the script (except those deemed extraneous, such as on blank lines). Only comments and extraneous whitespace are removed. This mode provides good compression, and insures that no code is broken.

Line Feeds Removed Wherever Possible

In this mode, line feeds are removed from the ends of statements in which it is determined safe to do so, usually resulting in an extra 2% to 5% compression. For example, lines ending in an operator such as *, /, +, -, etc., and those ending in a semi-colon will have any trailing line feeds removed. There are several other conditions that can be met, resulting in removal, and those are described below in the code description sections. Steps are also taken to prevent removal in instances such as missing semi-colons so as not to break code. However, I may not have caught all such conditions, so if the code is broken by this mode, you can fall back to the above mode. This mode achieves its best results when you are diligent about putting semi-colons after all statements that can use them to properly mark their endpoints.

Function Parameter and Variable Name Compression

This can be combined with one of the first two compression options to further reduce the script size. When enabled, as many function parameters and variable names as possible will be renamed and shortened. The naming scheme starts with the names a through z, then _a through _z, _aa through _az, _ba through _bz, etc. With this option enabled, script size can usually be reduced by an additional 10% to 15%. There may be a higher potential for broken code with this option, so it is not enabled by default. If enabled, it is recommended that you thoroughly test all compressed scripts before deploying them.

Code blocks can also be surrounded by special // #pragma NoCompStart and // #pragma NoCompEnd comments to exclude sections from compression. This is useful for including copyright notices in the headers of compressed script files or skipping sections that you are testing. For example:

The #pragma comments should appear on lines by themselves, and will be removed from the final compressed script. Any trailing comment text on the same line as the #pragma is ignored, and will be removed as well. The compressor doesn't care about spacing or case on the #pragma statements either.

The Programs

Two versions of the program are provided. The first is an interactive version that you can use to test the different modes of compression. It is a Windows Forms application written in C#. After running it, simply paste your JavaScript code into the Original Script text box, turn the Line Feed Removal and Variable Name Compression options on or off, and click the Compress button. The compressed script is then shown in the Compressed Script textbox, with some compression statistics displayed below it. The text can be copied to the clipboard from the Compressed Script text box.

Note that when using the Test only variable name compression option, the script code is not compressed. Only parameter and variable names are compressed. This may help locate a problem with the variable name compression code. Although the script code is not compressed, comments are removed so that the naming results match (i.e., it won't use different names due to matching a word that appears in a comment such as "a", "be", or "to").

The second and most useful tool is a console mode version of the compressor that can be used as the command for a pre-build step in ASP.NET projects to compress scripts in the project. It can also be used to compress scripts that are stored in custom web controls as embedded resources. The command line syntax is shown below. Options and file specs are case-insensitive, and are processed from left to right as encountered.

JSCompressCL [/options] filespec [[/options]
filespec ...]

The available command line options are as follows:

Option

Description

/?

Show help

/q

Quiet mode. Don't display compression statistics.

/debug

Debug build, compression is suppressed, and scripts are passed through to the output folder unmodified to make debugging easier. Compression can be forced using the /f option.

/release

Release build, compression enabled (the default if no build option is specified).

/k

(Keep) No line feeds are removed unless they are extraneous (i.e., blank lines).

/d

(Delete) Line feeds are removed wherever possible (the default if no line feed removal option is specified).

/v

Compress variable and parameter names.

/t

Variable name compression only (for testing it). This will strip comments as well, but all other compression options are ignored.

Recurse sub-folders in the file spec too. The sub-folder structure will be duplicated in the output folder.

/o:<dir>

Specify output folder (current folder if not specified).

filespec

One or more files to compress, wildcards accepted.

The debug and release build options are spelled out to make it easy to specify them in a project's pre-build step using one of the IDE macros. This is described below.

At the minimum, you should specify an output folder other than the one in which the scripts to compress reside. For example, you may want to store the uncompressed scripts in a folder called ScriptsDev and tell the compressor to store the compressed scripts in a folder called Scripts that the application will use at runtime. The compressor will not overwrite the source scripts. On debug builds, it also checks for an existing copy of the script and, if the timestamp is greater than or equal to the source script, it skips it. This saves recreating a script file that has not changed, each time the project is built during debugging. An "up to date" message is displayed in such cases. The scripts are always processed in release builds, to ensure that they are up to date and are compressed.

If a script is compressed, the tool displays the source and destination filenames along with the compression statistics. The /q command line option can be used to turn them off. Some examples are shown below (lines wrapped for display purposes):

Using the Console Version as a Project's Pre-Build Step

Copy the console version of the application to a folder somewhere on your PC. To use the console version as the pre-build step of a web project, create a folder to contain the uncompressed scripts (ScriptDev, for example), and another to contain the compressed scripts to be used at runtime by the application (Scripts, for example). To create a new folder in the project, right click on the project name, select Add..., select New Folder, and enter the folder name. Add a new script to the folder, by right clicking on it and selecting Add... and then Add New Item... to create a new item, or Add Existing Item... if you copied an existing file to the new folder. Once added to the project folder, right click on the script, and select Properties. Change the Build Action property from Content to None for the scripts in the development (uncompressed) folder. You can add copies of the scripts in the compressed folder and leave their build action set to Content if you want to do so.

The next step is to right click on the project name, select Properties, expand the Common Properties folder, and select the Build Events sub-item. Click in the Pre-build Event Command Line option to enter the command line to run. You can click the "..." button to open a dialog with a larger editor and a list of available macros. Below is an example of a common command line that can be used (lines wrapped for display purposes). Replace the path to the tool with the path where you stored it on your PC.

The /$(ConfigurationName) option expands to the configuration name in effect at the time of the build. Assuming the defaults, this will equate to either /Debug or /Release, thus turning off compression for debug builds so that you can test your scripts and debug them and turn it on for release builds. Note that the command line processor will look for an entry starting with "Debug" or "Release", so you can use custom configuration names. As long as they start with either of those two keywords, it will select the appropriate build type. If the configuration name contains spaces, place quote marks around the option. As noted, in debug builds, scripts are passed through to the destination folder as-is, to make debugging easier. If you want the scripts compressed in debug builds, add the /f command line option to force compression to be used.

The /o:$(ProjectDir)Scripts option equates to the compressed script folder. For my projects, it is always a subfolder of the main project folder, thus the use of the $(ProjectDir) macro. Modify the path name accordingly, for your own projects.

The same applies for the $(ProjectDir)ScriptsDev\*.js option which tells the tool where to find the scripts that need to be compressed. As above, modify the path name accordingly for your own projects.

Compressing Scripts that are Embedded Resources

If you are developing a web control, for example, that uses scripts that are contained in the assembly as embedded resources, you can still compress them using the above steps. The only difference is that, when setting up the folders as described above, make an initial copy of the scripts, and place them in the compressed script folder. In the project manager, right click on the scripts in the compressed script folder, select Properties, and change the Build Action property to Embedded Resource. When you build the project, the pre-build command will compress the scripts, the project will then be built in the normal fashion, and the compressed scripts will be embedded as resources in the assembly.

How the Code Works

The code for the Windows Forms and the console applications is fairly straightforward, and there is nothing much to describe. The forms version takes data from the controls, and uses it with the JSCompressor class. The console mode version does the same thing, but using command line parameters. The class itself is where the action occurs, and is described below. The code for the class can be found in the JSCompressor.cs file.

Basic Information

The JSCompressor class is fairly simple, and consists of a couple of constructors, properties to modify the line feed removal mode and variable name compression settings, a public method to compress scripts, and several private data members and methods. The default constructor enables line feed removal, by default. A second version of the constructor takes a boolean parameter that lets you specify the initial state for line feed removal (true for enabled, false for disabled). The LineFeedRemoval property lets you modify the mode after construction. The third constructor takes two boolean parameters that let you specify the initial state for the line feed removal and the variable name compression options. The CompressVariableNames property can be used to modify the variable name compression setting after construction. Variable name compression is off, by default. In addition, the TestVariableNameCompression property can be set to true to test the variable name compression code. When set to true, script compression is disabled, and only parameter and variable names are compressed. As noted above, comments are removed though, so that you end up with an identical set of renamed variables and parameters.

The Compression Process

The Compress method of the JSCompressor class does all of the work. It is passed a copy of the uncompressed script, and returns the compressed version.

The first part initializes two string collections that will end up containing any "no compression" sections specified by the #pragma comments and any literal strings found during parsing. A set of regular expressions and match evaluators are also initialized to help with the parsing and compression process. Their use is described later.

// Extract sections that the user doesn't want compressed
// and replace them with a marker.
strCompressed = reExtNoComp.Replace(strScript, meExtNoComp);
// This is the match evaluator referenced by meExtNoComp:
// Extract the sections that the user doesn't want compressed
// and save them for reinsertion at the end without the #pragmas.
// They are replaced with a marker character.
privatestring OnNoCompFound(Match match)
{
scNoComps.Add(reDelNoComp.Replace(match.Value, String.Empty));
return"\xFE";
}

The next part extracts the sections, if any, that the user does not want compressed, as specified via the #pragma comments (i.e., copyright notices at the top of the file). To do this, a match evaluator is used that adds the found section to the string collection and replaces it in the script with a marker character (\xFE). The marker will be replaced with the uncompressed section at the end of the process. Replacing the section with a marker helps the remainder of the code to remove extraneous whitespace, by giving it less to look at. The #pragma comments are stripped from the sections, before storing them in the collection.

After the "no compression" sections have been removed, the script is split into a character array to make parsing simpler. The array is passed to the CompressArray method which scans the script one character at a time, looking for block comments, line comments, literal strings, and JavaScript regular expressions enclosed in slashes (/ /). Block comments and line comments are removed by setting all characters within the comments to a null in the array. However, sections between /*@ and @*/ are left in the code, as they indicate a conditional compilation section. The code between the conditional section markers will still be compressed. Note that if you do use conditional compilation comments, it is important to end the line preceding the block with a semi-colon, as the browser will not process the conditional block unless it starts on a distinct line.

Literal strings and regular expressions are extracted and stored in a string collection, and are replaced by a marker character (\xFF) using a method similar to extracting and storing the "no compression" sections. Again, this helps the final steps remove extraneous whitespace, by giving it less to look at. During this process, carriage returns are converted to line feeds, which makes it easy to remove them later on as well.

Once the array has been parsed, it is converted back into a string, and all null characters (representing removed sections) are deleted. After that, regular expressions are used to remove leading and trailing whitespace from all lines, and to condense all runs of two or more whitespace characters to just one. This part and the subsequent steps are skipped if only testing variable name compression.

// Line feed removal requested?
if(removeLineFeeds)
{
// Remove line feeds when they appear near numbers with signs
// or operators. A space is used between + and - occurrences
// in case they are increment/decrement operators followed by
// an add/subtract operation. In other cases, line feeds are
// only removed following a + or - if it is not part of an
// increment or decrement operation.
strCompressed = Regex.Replace(strCompressed, @"([+-])\n\1",
"$1 $1");
strCompressed = Regex.Replace(strCompressed, @"([^+-][+-])\n",
"$1");
strCompressed = Regex.Replace(strCompressed,
@"([\xFE{}([,<>/*%&|^!~?:=.;])\n", "$1");
strCompressed = Regex.Replace(strCompressed,
@"\n([{}()[\],<>/*%&|^!~?:=.;+-])" ,"$1");
}

The next step is to see if line feed removal has been requested. If so, all line feeds occurring near numbers with signs and near operators are removed. As noted in the comments, care is taken around the + and - characters so that whitespace and line feeds are left around increment and decrement operations (++ and --) where needed, to prevent breaking code.

A final set of regular expressions is used to strip whitespace from around operators and the marker characters. Again, special care is taken with the + and - operators so as to correctly strip whitespace around occurrences of increment and decrement operations.

After removing all extraneous whitespace, if line feed removal has been requested, a few additional steps are taken to remove unnecessary line feeds from around if, while, and for statements. This helps remove line feeds from instances where those statements occur one after the other in any combination, with no intervening brace character. For example, the following would get condensed to a single line:

if(a == 1)
for(b = 0; b < 10; b++)
while(!c)
c = DoSomething();

If the code contains semi-colons on all statements that need them to mark their endpoints, the above process can usually remove all line feeds from the script, reducing it to one long stream of characters, thus providing maximum code compression.

Variable name compression occurs next, if requested. This process will be described in the next section. The last step is to reinsert the uncompressed sections and literal strings. In a manner similar to extraction, a regular expression and a match evaluator are used. Two private counters are used to keep track of the progress through the string collections. As each marker character is found, the match evaluator is called and, depending on the marker found, it returns the next element from the appropriate collection, which then takes the place of the marker. The matching counter is also incremented ready for the next match. After the insertions have been made, the compressed script is returned to the caller.

Parameter and Variable Name Compression

The CompressVariables method handles the compression of function parameter and variable names. Since there is the potential to break code, the compression method takes a conservative approach to locating and renaming variables.

Function parameter names appearing within the parentheses on a function declaration are included for compression.

Variable names on the same line as a var statement are included for compression. However, if the var statement spans lines and extra line feed removal is disabled, some names may be missed. For example:

var string1, string2, num1, num2;

In the above example, string1 and string2 will always be included, but num1 and num2 will not be included if the LineFeedRemoval property is set to false as they will always appear on a line by themselves with no indication that they are variables.

On a similar note, variable names that appear in the code but that are not formally declared with a var statement will always be ignored (i.e., global variables declared in another module).

If you declare global variables that are referenced in other script files, you should wrap their declarations in a #pragma NoCompStart/NoCompEnd section so that they are not renamed within the file that they are declared.

The first part searches for function parameters using a regular expression created earlier. The parameter list is split apart, and each unique parameter name is added to the variable name string collection.

The next part searches for var statements that contain variable name declarations, using a regular expression created earlier. This step is slightly more complex as it must account for assignments that occur within the statement as well as possible references to array indices that might cause an incorrect split to occur. For example:

The var prefix is removed from the statement, followed by any parts of the expressions that contain brackets or parentheses containing commas (i.e., two-dimensional array indices, function call parameters, etc., as shown in the above examples). Once they are removed, a final regular expression is used to remove any remaining assignment text from the equal sign to the next comma or end of the line. Once this is done, it is safe to split the string on each comma and add the unique names to the variable name string collection.

The final step loops through each unique variable name found, and substitutes a shorter name. Once done, the compressed script is returned. As noted in the comments, the naming scheme starts with a through z and, if they run out, it adds an underscore prefix and carries on (_a through _z). The underscore ensures that it will not accidentally create a name that could match a keyword once it gets past single letter variable names. Should those names be exhausted, it starts appending letters and runs through each set from _aa to _az, _ba to _bz, etc. The code is written such that it will expand the names further if needed, but it is more likely that the script will have fewer unique variables than the number of unique new names that can be generated by the compressor.

As each new name is created, a check is made to ensure that it does not already exist in the script. For example, common loop variable names such as i or j will cause it to skip those new names if they are used in the script already. Likewise, if the new name is longer than the existing name, it will not be replaced. However, as noted, you could remove that check in order to completely obfuscate the names if necessary.

Conclusion

On average, my own scripts have been reduced in size by 50% to 60%. Adding in variable name compression increases the savings by an additional 10% to 15% in the average script. Naturally, the more you comment your JavaScript code, use indentation to make the code more readable, and use descriptive variable names, the better the compression rates, as there is more stuff to remove. Using semi-colons to mark statement endpoints can also increase the compression rates as it enables the code to remove most if not all of the line feed characters too.

History

06/26/2006

Modified the compression code to allow for conditional compilation blocks (/*@ @*/). Modified the command line compressor to scan and compress sub-folders if the /r option is specified.

03/05/2006

Added the option to compress function parameter and variable names. Tested the code under Visual Studio 2005 and .NET 2.0. The demo project is a Visual Studio 2003 project, but will convert and build without any problems under Visual Studio 2005.

Share

About the Author

Eric Woodruff is an Analyst/Programmer for Spokane County, Washington where he helps develop and support various applications, mainly criminal justice systems, using Windows Forms (C#) and SQL Server as well as some ASP.NET applications.

He is also the author of various open source projects for .NET including:

The Sandcastle Help File Builder - A front end and project management system that lets you build help file projects using Microsoft's Sandcastle documentation tools. It includes a standalone GUI and a package for Visual Studio integration.

Visual Studio Spell Checker - A Visual Studio editor extension that checks the spelling of comments, strings, and plain text as you type or interactively with a tool window. This can be installed via the Visual Studio Gallery.

PDI Library - A complete set of classes that let you have access to all objects, properties, parameter types, and data types as defined by the vCard (RFC 2426), vCalendar, and iCalendar (RFC 2445) specifications. A recurrence engine is also provided that allows you to easily and reliably calculate occurrence dates and times for even the most complex recurrence patterns.

Windows Forms List Controls - A set of extended .NET Windows Forms list controls. The controls include an auto-complete combo box, a multi-column combo box, a user control dropdown combo box, a radio button list, a check box list, a data navigator control, and a data list control (similar in nature to a continuous details section in Microsoft Access or the DataRepeater from VB6).

Search for "TODO" in the JSCompressor.cs file. You'll find a line in the constructor that sets the variable name compression variable to true by default for testing that I forgot to remove. Delete or comment out that line and rebuild the project to fix it.

First, great tool it works like a charm, yet I found a small bug regarding the compression of variable names: If a variable is named equal to an object's property, the property get renamed too and breaks code. However I was able to tweak your code to correct this behaviour very simply.

Example, where this problem occurs. Here the table.row property gets renamed to the variable name.

var table=document.getElementById('tablename');
var rows=table.rows;

It is easy to correct this by checking if the variable is a property as indicated by a leading "." during the replacement.

I did that by augmenting line 500

from:

@"(\W)" + replaceName + @"(?=\W)", "$1" + name);

to

@"([^\w.])" + replaceName + @"(?=\W)", "$1" + name);

Im not too good with regular expressions, so there might be a way to express this more elegantly, yet it already fixes the problem.

I don't do enough JavaScript and JSDoc probably works quite well so I can't see trying to compete with it. Also, for XML comments in Atlas libraries there's AjaxDoc (http://www.codeplex.com/AjaxDoc[^]). For that, I did write a plug-in for the Sandcastle Help File Builder (http://www.codeplex.com/SHFB[^]) which lets you create help files using it. It will be available in the next release (1.5.2.0) due out soon.

Search for "TODO" in the JSCompressor.cs file. You'll find a line in the constructor that sets the variable name compression variable to true by default for testing that I forgot to remove. Delete or comment out that line and rebuild the project to fix it.

First of all, thanks for spend your time and share the result of your work with us, the JSCompressCL is very useful and FREE.

I started to use JSCompress and i found a problem with variable compression in for statement. I will show a sample:

for (var property in this)
{
var value = this[property];

[... any code ...]
}

The regex responsible to get the var called "property" gets variable group wrong. The regex is intend to extract "var property" and after extract "property" name. But it first extract "var property in this (var value = this[property]" width this two var expression the tool can't extract "property" name succefully.

I found this problem with regex "reFindVars" "(var\s+.*?)(;|$)".
I update this regex to "((var\s+.{1,}?)(;|$|in)", the compression work and I did'nt find any colateral affects.

Thanks for the report. As you noted, I think the fix is to remove the case-sensitive flag from the regular expression. Also note that the compressor will fail if variables are declared inline in for() statements (i.e. for(var i=0;...)). I don't have a fix for that yet.

I have tried to compress javascript code using this tool on the highest level including "Compress variable names". I have been running the tool as application. I have tried to compress the file jquery-1.0.2.js that can be found on http://docs.jquery.com/. But the tool crashes because of unhandled exception with a message (translated into English): Insufficient amount of closing brackets. Lower compression level works.

It doesn't currently handle var statements that appear within for() statements. I looked at the web site you reference and they do provide a compressed version of the script already. So for now, you can use that or use the compression without variable name removal.