Introduction

WARNING: This code has known bugs. It doesn't deal with non-ASCII filenames correctly. It doesn't deal with passwords correctly. I don't have time to fix it, unfortunately. But I've marked it so that other gold members can edit it, in case anyone wants to make the fix.

This source code shows how to add zip/unzip functionality to your programs. Lots of people have written their own wrappers around Zip, and indeed there are several articles on CodeProject that are based on earlier versions of my own code. How is this version different?

Clean packaging. There's one pair of files zip.cpp, zip.h to add to your project if you want Zip. Another pair unzip.cpp, unzip.h if you want unzip (or both if you want both!). There are no additional libraries or DLLs to worry about.

Clean API. Most other APIs around zip/unzip are terrible. This one is the best. The API is short, clean, and in a familiar Win32 style. Most other APIs wrap things up in classes, which is ugly overkill for such a small problem and always turn out to be too inflexible. Mine doesn't. See the code snippets below.

Flexibility. With this code, you can unzip from a zip that's in a disk file, memory-buffer, pipe. You can unzip into a disk file, memory-buffer or pipe. The same for creating Zip files. This means that at last you don't need to write out your files to a temporary directory before using them! One noteworthy feature is that you can unzip directly from an embedded resource into a memory buffer or onto a disk file, which is great for installers. Another is the ability to create your Zip in dynamically growable memory backed by the system page file. Despite all this power, the API remains clean and simple. The power didn't come from just writing wrappers around other people's code. It came from restructuring the internals of zlib and info-zip source code. My code is unique in what it does here.

Encryption. This version supports password-based Zip encryption. Passwords are absent from many other Zip libraries, including gzip.

Unicode. This version supports Unicode filenames.

Windows CE. This version works as it is under Windows CE. No need to alter makefiles or #defines, or worry about compatibility of any LIB/DLL.

Bug fixes. This code is based on gzip 1.1.4, which fixes a security vulnerability in 1.1.3. (An earlier version of my code used 1.1.3, and has crept into other CodeProject articles...).

At its core, my code uses zlib and info-zip. See article end for acknowledgements & license.

Using the code

Similarly for unzipping, add the file unzip.cpp to the project and #include "unzip.h" to your source code. Zip and unzip can co-exist happily in a single application. Or you can omit one or the other if you're trying to save space.

The following code snippets show how to use zip/unzip. They are taken from one of the demo applications included in the download. It also has project files for Visual Studio .NET and Borland C++ Builder6 and Embedded Visual C++ 3. The code snippets here use ASCII. But the functions all take arguments of type TCHAR* rather than char*, so you can use it fine under Unicode.

Example 1 - create a Zip file from existing files

// We place the file "simple.bmp" inside, but inside
// the zipfile it will actually be called "znsimple.bmp".
// Similarly the textfile.
HZIP hz = CreateZip("simple1.zip",0);
ZipAdd(hz,"znsimple.bmp", "simple.bmp");
ZipAdd(hz,"znsimple.txt", "simple.txt");
CloseZip(hz);

Common Questions

STRICT? I think you should always compile with STRICT (in project-settings/preprocessor/defines), and full warnings turned on. Without STRICT, the HZIP handle becomes interchangeable with all other handles.

How to show a progress dialog? One of the included examples, "progress", shows how to do this.

How to add/remove files from an existing Zip file? The zip_utils currently only allows you to OpenZip() for unzipping, or CreateZip() for adding, but don't allow you to mix the two. To modify an existing Zip (e.g.: adding or removing a file), you need to create a new Zip and copy all the existing items from the old into the new. One of the included examples, "modify", shows how to do this. It defines two functions:

Discussion

Efrat says: "I think the design is very bad", and so objects when I say that my API is clean and others are not. (Actually, he says my documentation is the most conceited he's seen and my design is the worst that he's seen!) I've reproduced his comments here, with my responses, so you can make a more informed decision whether to use my library.

[Response] I love the boost library. If people can figure out how to add it to their projects and zip/unzip with it, they should definitely use boost rather than my code. (I'm still trying to figure it out, though, and couldn't get it to compile under CE.)

you'll never inherit from an archive, nor invoke virtual methods from it: we only use encapsulation, not any of the other pillars of OOP. By using an opaque handle HZIP rather than a class, I indicate this clearly to the programmer. Also,

C++ classes don't work cleanly across DLLs. Handles like HZIPs do.

[Efrat] For instance, progress-notifications should be done by virtual functions in a derived class, not by callbacks.

[Response] To get progress, you invoke UnzipItem in a while loop, and each iteration unzips a little bit more of the file. This is clean, re-entrant, and has a simple API. I think this is an easier API than inheriting from a class. I think inheritance from library classes is bad, in general.

[Efrat] Compression should go in a DLL.

[Response] I disagree. DLLs are always pain, for developers as well as users. Unzip only adds 40K in any case.

[Efrat] The API doesn't use the type system to differentiate between an HZIP for zipping and an HZIP for unzipping.

[Response] This was intentional. The difference between zipping and unzipping is a current implementation drawback. I think an API should be clean, "inspirational", and you shouldn't encode current implementation limitations into the type system.

[Efrat] The API uses error-codes, rather than exceptions, but anyone who has graduated Programming 101 knows exceptions are better.

[Response] I think exceptions are not welcomed anywhere nearly as widely as Efrat suggests. Also, they don't work cleanly across DLL boundaries, and they don't work on Pocket PC.

[Efrat] The API is inflexible; it should be coded for change, not just coded for all the options that were conceived while designing (handles, files, memory). Most users will think of sources and targets which this design can't support.

[Response] The original Zip uses FILE*s, which are effectively the same as Windows pipes. I also provided memory-buffers which add an enormous amount of flexibility that's easy to use and requires no additional programming. For any user who needs sources and targets which can't be reached via a memory buffer, they shouldn't use these zip_utils.

[Efrat] The is unnecessarily Windows-specific. The original zlib works great and is portable; zip_utils offers no advantages. Compression is memory-manipulation and IO and so should not be platform-specific.

[Response] In the olden days before STL, "cross-platform" code inevitably meant:

peppered with so many #ifdefs that you couldn't read it,

didn't work straight away under Windows.

I started from an old code-base, and so Efrat's proposed bottom-up rewrite was not possible. The advantage this code offers over zlib is that it's just a single file to add to your project, it works first time under Windows, you can add it easily as a CPP module to your project (not just dll/lib), and the API is simpler.

In general, Efrat wants code to be a clean extensible framework. I don't; I want small compact code that works fine as it is. Furthermore, I think that "framework-isation" is the biggest source of bugs and code overruns in the industry.

Acknowledgements

This version of article was updated on 28th July 2005. Many thanks to the readers at CodeProject who found bugs and contributed fixes to an earlier version. There was one terrible bug where, after a large file had been unzipped, the next one might not work. Alvin77 spotted this bug.

My work is a repackaged form of extracts from the zlib code available at www.gzip.org by Jean-Loup Gailly and Mark Adler and others. Also from the info-zip source code at www.info-zip.org. Plus a bunch of my own changes. The original source code can be found at the two mentioned websites. Also the original copyright notices and licenses can be found there, and also inside the files zip.cpp and unzip.cpp of my code. As for licensing of my own contributions, I place them into the public domain.

Share

About the Author

Lucian studied theoretical computer science in Cambridge and Bologna, and then moved into the computer industry. Since 2004 he's been paid to do what he loves -- designing and implementing programming languages! The articles he writes on CodeProject are entirely his own personal hobby work, and do not represent the position or guidance of the company he works for. (He's on the VB/C# language team at Microsoft).

Comments and Discussions

The code is written very elegantly and that's why we have been using it for so long time but recently I have encountered an issue while decompressing a file having actual = decompressed size > 4 Gb. I see that you are using <big>unsigned int</big> to calculate the size and I think that is where it is going wrong. Any suggestions. Is there also any limit on filesize while zipping ?

I also find it's useful to know how to convert std::string to TCHAR* to match with your API, this is also what I found, hope it will help someone else:

" A TCHAR is not a string. It's a macro that's defined as a char or wchar_t depending on whether UNICODE is defined. The solution is kinda ugly:

std::string str="something";TCHAR *param=new TCHAR[str.size()+1];param[str.size()]=0;//As much as we'd love to, we can't use memcpy() because//sizeof(TCHAR)==sizeof(char) may not be true:std::copy(str.begin(),str.end(),param);

* If you're writing a component that might be used by other people, then use std::tstring and TCHAR. Both of these will boil down to char or wchar_t depending on whether UNICODE is defined.

* If you're writing your own app, and you want to commit to one or the other, then just use it. For instance, you might decide to stick to std::string throughout your code, and don't define UNICODE, and then TCHAR will always be the same as char. Or alternatively, you might decide to stick to std::wstring throughout your code, and you do define UNICODE, and then TCHAR will always be the same as wchar_t.

* There are very few circumstances where an app needs to convert between narrow and wide characters, such as you did with std::copy. You'd only do that if you had (for example) an API which was char-only, and you needed to pass it to a wchar_t-only API. Those situations are pretty rare.

Hi,I rarely comment anything, but i have been searching for 2 days for something simple and efficient to use, i found it now with you after i decided to search on codeproject. You save me many hours of work...Thank you

I agree with you that every piece of code should be very simple to use, no headache, the way you did is perfect, one cpp file, one h file, that's all!

hey and thanks for this very usefull and simple code to zip and unzip.I need to "zip" my file in tar format, meaning just agregating files together without any compression.Can I do that with this zip utils ? And how ?Thanks a lotPascale

Hello everyone, today is a great day, for you and for me.I am french, and on my computer were some files with extended chars in their names, as 'é' or 'è'...With unzip, when you want to unzip a zip file, if it contains file with extended chars in their names, you get commas instead, or other cryptograms... It was really bad for me, as I told you.But today, I have THE solution! Indeed, you just have to use CStringT and the member OemToAnsi in your unzipping code.An example :

I am trying to use the unzip code to unzip a zip file that I have read into memory. The method GetZipItem is not working. After calling it, the ZIPENTRY does not show the correct entry file name, or size. If I proceed with my code, UnzipItem() returns 0x10000 indicating an error.

I have used the same zip file I'm trying to use here with your simple example and it worked fine. My zip file has one file entry in it.

The original code will always uncompress 12 bytes short of the original size if you use a password and the original file size is greater than 0x4000 in size.

I tracked down the source of the problem. In the function unzReadCurrentFile found in source Unzip.cpp, the author mistakenly subtracts the 12 bytes from the uncompressed size while adjusting the pointers for the header size difference for encrypted files. Simply remark out that line.

I remark out that line,but when i use it to uncompress a file which contains four .xml file it doesn't work .It give me a ZR_FLATE error,and if I deal with some compressed .txt files .Can you tell me how to fix this error?

@CodeHead: Awesome! Thanks. It indeed fixes a very nasty bug -- in my case the unzipping process would hang up in an indefinite loop. The original author would do all of us a huge favor if he applies this simple fix in the original source code.

Hi. I use your zip/unzip code in our application and we have problem with non-english chars in filenames.I have read, that ZIP doesnt care filename encoding.We generate archive by 7-zip. This archive included non-english chars in filenames. Local default charset is CP1250 - I am from Czech Republic.I don't know why, but filenames encoding is Latin II (CP852).For correct unzipping I have to transform CP852 to CP1250...

And this is the question.Exists any workaround how can I find out zip filenames endcoding (in this case CP852)?

First of all - thank you very much for this really useful piece of code. It was super easy to just add the header and cpp to my project and have it working, without worrying about including the right libs/dlls and using the right version of the crt etc!

Second of all - please ignore the idiot/s who wanted to use exceptions or wanted it as a dll. They probably have never tried writing exception neutral code.

Hello,I'm using this code for creating zip folder and adding log files to it automatically at the end of the day. But the code does not produce a zip folder at all. While debugging it and tracing the function calls, i saw that call to CreateFile() from within Create() does not take me in the definition of CreateFile(), instead just steps to the next line in code. When I search for CreateFile's definition, I can't find one. I'm using this with Borland Builder 5.

Has anyone encountered this problem before? Is there a corrected code for this already?Please let me know asap.

I find it odd that this library is implemented internally as C++ classes but exposes the API as C functions. I could understand it if you wanted to make the library callable from C programs but you have used overloaded functions in your API so that's not possible.

Why not just expose the C++ classes themselves, thus avoiding a layer of wrapping? Use public / private to indicate what is callable externally and what is not. Alternatively, do away with the function overloading and declare all API functions as extern "C". There are people out there who would appreciate that.

Instead, provide a proper API for each type of unzip (memory, handle, file) with a dedicated method signature for each. That way you can type your parameters. And lay your code out properly! It's far too dense.

Useful library though, for all that. Thanks. I appreciate it being packaged in so few files.

I strongly dislike object-oriented APIs... the pillars of OOD are encapsulation, inheritance, polymorphism, objects, and of these only encapsulation is good for an API and I think the others are actively bad. (In some cases like UI or streams, then inheritance is probably needed, but I've not yet seen a good inheritance architecture in *ANY* UI or stream library!) I figure that opaque handles where (1) the win32 way, (2) the best language paradigm in which to express encapsulation but not the others. The header file should certainly not contain private methods, only public, which means that even if I went down that route then I'd still have needed something that looked/smelled /felt like the wrappers and probably took as many lines of code.

Your point about overloading is good. I've learned that since writing the code! And yes I should have put extern "C".

My dense code is because I believe in "one concept per line of code" because I reckon the gestalt is important -- being able to see more stuff on one screen-full. I reckon that fragmentation of code (into one class per file for goodness sake!) is a force for bad in the industry, and javadoc-style comments are an ugly symptom of the belief that there's no need to document "overall behavior" and can get away with merely documenting the constituent parts.

I found the GetGlobalComment function in unzip.cpp line3699:int unzGetGlobalComment (unzFile file, char *szComment, uLong uSizeBuf)but I dont know how to use it?and when I want to set GlobalComment of one zip file,how can i do?thanks a lot

hi,i would like to unzip from memory to memory a stream of bytes which compose a .zip archive.. is possible to unzip the stream of bytes while receiving it from network, so unzip a part of archive, then another.. and at the end compose the full archive?

I have some strange requirement where i am supposed to zip a zip file with a password and upload this data to say a FTP server.On server site, a client will do the unzipping part, but when tried to unzip the file using OpenZip it didn't gives any error but when tried to open the file Winzip throws an error. When we tried to unzip the manually, then it successfully got unzipped. I am just wondering whether we are supposed to apply the OpenZip recursively.Please help !!!

I'm compiling this utility under wxDevC++ (unsure which version), and I came across the following errors:

zip.cpp (2152): Integer constant is too large for "long" type.unzip.cpp (3728): Integer constant is too large for "long" type.

This was a fairly easy fix in both cases; modifying the source code, i took the 116444736000000000 (the big number causing problems), added a .0 to the end of it (forcing it to be a long double, which has a much larger range), and then cast it into the correct type (__int64 for the zip.cpp, and LONGLONG for the unzip.cpp). I'm planning on using this fix in my commercial projects... Is that okay? I've added comments in the source indicating these changes, and i'm planning on putting acknowledgements of my use of this utility in my program (if it's acceptable for me to use this modified version).

I like that code very much, because it is really simple to use and it is possibel to zip/unzip with password. But only the unzip with password has a bug as already mentioned by forum member shynnuaa.Here are some more info how to reproduce it very easy:You must have a bigger file than the used buffer of 16384 Byte in the Unzip function.If the subroutine unzReadCurrentFile() get the next block, there is a wrong value calculated for the remaining bytes in the inflate() function.But that bug is only easy to find by the creator of that code.Without a fix it is not save to used this code for bigger files. :(

Hi. I don't know if it is actualy a bug or a feature, but it is possible to add 2 and more files with equal names into one archive. I was really surprised trying to figure out why my favorite archiver (one of the famous) couldn't unzip one.