Our journey from EXEs, LIBs, DLLs, COM to Assemblies

An article on the important milestones, events and potholes we witnessed as we are moving from EXEs to DLLs to COM to Assemblies.

Background

This article tries to mark the important milestones, events and potholes we witnessed as we are moving from EXEs/DLLs to COM to Assemblies. I will try to be less verbose and more concise in this article so that we can avoid the jargon and quickly learn the important points.

The Stone Age of EXEs

In the days of good old MS DOS, we used to compile and link our nice little C code into an executable. Any additional code that we used in our program (for example, standard library functions provided by C) would get inserted into the EXE itself. This additional code was provided in the form of object (.OBJ) files by the Turbo C/Borland C environment. Those EXEs were self sufficient and you could *install* it on any machine by merely copying it to a folder.

Pros:

Self sufficiency.

Cons:

No way to share binaries. If you want to use code developed by someone else, you have to have the actual source code.

The era of LIBS (Static linking)

As software became bigger and more complex, there was a need to use (consume) what we call a ‘third party code’. For example, you would need to use a sockets library to do some network programming inside your own program. Such libraries were available in the form of library (.LIB) files. A LIB file would contain the object code of the library. The linking step had the task to ensure that the object code would be merged with the EXE. This is called as static linking.

Pros:

Binary reuse without the need to replicate the source code. All you need to have is a LIB file from a vendor and some documentation or the header file (.h) to check the function signatures.

Cons:

Bigger EXE size and waste of storage: as the library code used to get replicated in each EXE.

Waste of memory: in multitasking environment, two copies of the same library function would be in memory unnecessarily.

Any change in library required recompilation of the EXE to include that change in the EXE.

The progress to DLLs (Dynamic Linking)

The idea of code sharing remained the same. However, the library code was isolated from the executables to form a DLL (dynamic link library) file. These libraries would get linked with the application calling them at *runtime*. Also, multiple applications calling the library functions would share only one copy of them from the memory. The operating system was responsible to do this linking at runtime. A user just needed to place the DLL in the same folder or in the system folder of the OS, for this scheme to work.

Pros:

Solved problems related with static linking.

Cons:

Difficult cross language usage: DLLs and EXEs written in different languages were difficult to interoperate because the metadata stored in the DLL was language dependent.

DLL versioning problems: If your program has been tested with a particular version of a library, but some other program installation updates the library with a latest version during the installation, your program may not work well. This is the infamous DLL hell.

Lack of location transparency.

Then came the COM

COM provided an infrastructure so that the client applications can bind to the components at runtime. Unlike DLLs, at compile time, you had very little dependency on the library objects when your program is consuming COM objects.

Pros:

Language independent: COM objects have to follow the same memory layout laid out by the specification.

Object oriented: Unlike the DLLs which were from the procedural programming era, COM offered object oriented programming based design.

Cons:

Complexity: COM was difficult to learn and understand.

Registry storage: A reference to a COM object would be resolved with GUIDs-location mapping stored in the Windows registry which makes it (Windows) platform dependent and tedious to work with.

Deployment and maintenance were painful.

Have we reached utopia with Assemblies?

The compiled .NET code is stored as an assembly. The assembly stores:

IL code.

Assembly metadata (manifest).

Identity: Name, version and culture info.

Names of files within the assembly.

Types access data: private or otherwise.

Security permissions.

Type metadata.

Details of types, methods and properties within the assembly.

Resources: an assembly is therefore self-describing. It does not depend on external things like registry entries or type library files for reuse. ILDASM tool shipped with .NET gives an insight into the manifest and type metadata.

Pros:

Completely self describing and language independent.

Two versions of the same assembly can be loaded: this is possible because the manifest also stores version info. (This ends the DLL hell.)

Easy installation: a consumer program can just have the assembly in the same folder or GAC (Global Assembly Cache) and start using the objects defined.

Cons:

Requires the .NET framework installed (just like Java requires the JVM). For lightweight desktop utilities, users may complain about the huge .NET runtime download or install.

Since the assembly stores very detailed metadata, reverse engineering of the assembly to the source code is possible. Obfuscators are available which take care of this to some extent by garbling the assembly so that it is difficult to understand for a human.

Assemblies run in a managed environment. So the cons for the environment (like no direct control over resource freeing and so on) are applicable to assemblies.

History

03 January 2005: Included assembly cons.

16 December 2004: First draft.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

Share

About the Author

Comments and Discussions

It's interesting that as the road gets easier for developers, it gets harder for users. Users would like the Stone Ages of exe's much better than the Enlightened Age of assemblies. If I was selling a product and put .NET 1.1 Framework required, I bet less than 5% of the Windows users would know if they met this requirement or not. The installer would have to figure it out, but how many users are willing to download a huge installer to try out my program? Not many.

Yes nalenb, an irony for sure. I think, as we became used to the MFC runtime over time, and eventually it became part of the OS itself, the .NET runtime will also be less of an overhead in future.

Right now I am very excited about the benefit it offers. My average time to convert an idea into a solution is reduced by a great margin. Also I am mainly into enterprise solutions so the .NET runtime installation is not much of a trouble.

Ouch. I think there are two targets I think of when I post an article on CP. One is - enlighten the community with something new I might have come up with. And the second is to get my concepts/ideas/code exposed for discussions and learn from the others point of view. I am targeting for the second one here.

- Any child can decompile your code (do you know the Reflector?)
- Your code is compiled while user is waiting. So, the JIT compiler can't optimize your code. So, your code is slower than a VB6 generated one.
- The runtime does a lot of security checks in your code. So the user needs to wait a little bit more. And the JIT need to do less optimization.
- You have a Garbage Collector running while your application is running, and you never know when it will run. You need to take care (more than in unmanaged programming) of your allocations, so the GC will recall your memory in the first generation when possible. Sometimes your app has 5 Forms opened and Task Manager will report you’re using 30MB RAM. WinWord uses less than this.
- You need to call Dispose() to release resources. you don't have this problem using ASP or VB6. In C++, you would call IUnknown->Release() or use CComPtr anyway.

Rodrigo Strauss wrote: You have a Garbage Collector running while your application is running,
What this has to with the concept of an Assembly?

Rodrigo Strauss wrote: Task Manager will report you’re using 30MB RAM
No, you're wrong: it'll report that the working set size is 30MB, and that means that you can be using something in the range of 4K RAM to 30MB RAM.
Just because the GC has reserved 30MB in RAM pages, this does not necessarily mean that Windows gave all that memory to the GC.
Besides that, the .NET framework references a lot of kernel and Windows libraries, which increases its working set, without spending a single byte of RAM.

Rodrigo Strauss wrote: - You need to call Dispose() to release resources. you don't have this problem using ASP or VB6. In C++, you would call IUnknown->Release() or use CComPtr anyway.
Reference counting or generational GC has nothing to do with the concepts of assemblies as a form of sharing code. Besides that, assemblies can expose CCW, which will behave as any COM component, with AddRef, Release.

Rodrigo Strauss wrote: So, your code is slower than a VB6 generated one.
Do you have some study to prove this?

Rodrigo Strauss wrote: - Any child can decompile your code (do you know the Reflector?)
Again, this has nothing to do with the concept of assemblies: .NET assemblies can contain native code (use C++) - System.Windows.Forms classes have plenty of private native methods that can't be decompiled with Reflector.

You missed some very important points:
* The .NET virtual machine is less powerful than a real CPU. Example: it does not support 'long double'.
* Assemblies aren't really 'language independent'. They only work for languages that are variants of C#.
* IMHO, the really big problem with GC is the way it interferes with RAII.

Adding version info to DLLs is great (but somewhat obvious).
But they could have done it without tying it to .NET.
IMHO, the cure is worse than the disease. (Unless you were already using VB, in which case even amputation of all limbs would be better than the disease).

I agree that there are cons to assembly, however the severity with which you are trying to project them is not right, in my opinion. I will update my write up with the valid points soon.

Rodrigo Strauss wrote:- Any child can decompile your code (do you know the Reflector?)
There are obfuscators which will garble your assembly metadata so that it would not be understandable to a human.

Rodrigo Strauss wrote:- Your code is compiled while user is waiting. So, the JIT compiler can't optimize your code. So, your code is slower than a VB6 generated one.
- The runtime does a lot of security checks in your code. So the user needs to wait a little bit more. And the JIT need to do less optimization.
I agree, performance of managed code will be somewhat slower as compared to native machine code. But think about the facilities the .NET environment is providing you with. Also it is lot more efficient than what we might think in our minds. Do have a look at performance comparisons published on the net.

Salil Khedkar wrote:Rodrigo Strauss wrote:
- Any child can decompile your code (do you know the Reflector?)
There are obfuscators which will garble your assembly metadata so that it would not be understandable to a human.

It may make it not understandable to a human, but it is still understandable to copy & paste. The main questions for me when thinking about this issue is "Why do I need to protect the code?". If it's because you don't want the registration feature hacked then it's probably just best to use another language all together, but hackers will still figure out how to crack it, just not as quickly. If the reason is that you don't want people to be able to replicate what you are doing then take a step back, no matter what you are coding, most likely someone else could do it as well, and probably faster and better. Maybe you have passwords and other sensitive data embedded in the source code, if this is the case, you need to take a serious look at your design, because this is not a good idea at all. If it can be hacked from the source code, it can be hacked from the binaries.

Rodrigo Strauss wrote:- The runtime does a lot of security checks in your code. So the user needs to wait a little bit more. And the JIT need to do less optimization.

Something that concerns me about this too is that developers may get a false sense of security when they hear that the JITTER does security checks on the compile while it runs it.

C/C++ (and other "unmanaged" languages) gets a bad wrap when it comes to security just because buffer overruns are a possible problem that sloppy code and create. However, bad logic (not encrypting data, log in holes, poor data verification, not deleting temporary data) is a big security threat too that often gets overlooked. I think once developers hear about security checking in .NET, they will probably focus even less on security because of a false sense of security--a very bad situation.