This tutorial is designed to cover the basics required to control Microsoft Office programs when automated from a VB.NET application. The examples utilized here involve Microsoft Excel, but the principles are the same when automating other MS Office programs such as Access or Word.

Using .NET affords a number of advantages:

The main advantage for the MS Office developer is the strong typing that the language affords. The .NET languages such as VB.NET and C# enforce strong-typing and dissuade the use of weak typed Object or Variant data types. Similarly, the use of late-binding is not encouraged, although is natively enabled by VB.NET. (C# can also perform late binding, but does so through reflection code; it is not a C# language feature.) By enforcing strong typing and discuraging the use of late binding, many more errors can be caught at compile-time, instead of at run-time, when it is often too late.

Managed memory (garbage collection) afforded by the .NET Framework is another key reason why a programmer should consider using .NET languages, although in the case of automating Microsoft Office applications, this can actually cause some complexities. Helping the developer overcome the issues encountered when releasing COM objects from .NET is a key goal of this tutorial.

The .NET Framework also supports advanced programming concepts such as inheritance, delegates, generics, collection classes and reflection coding. Although perhaps not of immediate need by the typical MS Office developer, these features become of greater value as the programmer's skills improve and as a project's complexity increases.

For these reasons and more the VBA or Visual Basic 6.0 developer should consider the .NET language platform.

Do be aware, however, that execution speed when using .NET Automation will be similar to, or slightly slower than, it was under Visual Basic 6.0 Automation, which is considerably slower than VBA executes natively. We discuss some ways of dealing with this within the "Execution Speed" section of the tutorial. Another issue to consider is deployment; VB.NET can be tricky to implement for MS Office versions below Office 2002 (version 10.0). We discuss some deployment options within the "Deployment Issues" section later in the tutorial.

Ok, with the background and caveats behind us, let's get started...

Adding The Project Reference

The first step in Automating a Microsoft Office Program with VB.NET is to have the required Project Library referenced by your VB.NET project.

Here we will be Automating Excel, so one would need to add a reference to the Microsoft Excel Object Library. The procedure to do this is almost the same as it was in legacy VB 6.0:

(1) Within the Visual Studio .NET IDE, choose the Project > Add Reference... menu item. Next you will want to click on the COM tab at the top because the Excel object model is a Component Object Model (COM) library. Now scroll down until you find the "Microsoft Excel 11.0 Object Library" (which could be 10.0, 9.0, 8.0 or even 12.0, depending on your system).

(2) Next, click the [Select] button and then click [OK]. The reference to the Excel Object Library is now loaded.

Basic Shell

The next step is to write your code. The basic shell for Automating Excel could look something like this:

Code:

' Include the following at the top of the Module:
Imports Excel = Microsoft.Office.Interop.Excel

The code above serves as a basic outline for creating and then safely disposing of your Excel Application object. What one might place in the "Your Code Goes Here" block is up to you, but is also discussed in the "Your Code Goes Here" section later on in this tutorial. But before we move on to that, let's discuss the 'cleanup' section first...

Cleanup

Cleanup, as it pertains to .NET Automation, can be a tricky business. The difficulty derives from the fact that .NET is calling the Excel object model through the COM Interop, which is an interconnection between two technically incompatible worlds: (1) the Component Object Model (COM) and (2) the .NET Framework. COM project libraries are created by programs such as C++ or Visual Basic 6.0 and rely on an object reference counting scheme for maintaining and releasing object references. When an object's reference count reaches zero, it is immediately released. The .NET Framework however, uses a managed memory scheme whereby live references are periodically traced from the roots, to other referenced objects, in a tree like structure; when completed, all unreachable objects are released. While the .NET managed memory approach is generally superior and prevents memory leaks caused by circularity-references (with which the COM reference counting approach can have great difficulties), the fact that .NET uses non-deterministic finalization adds complexity for us when we are trying to get our MS Office application to release.

There are many complex techniques for releasing a COM Object in .NET, but the example shell, above, is showing what we believe is the cleanest arrangement. Let's repeat the cleanup section here:

Note that the Marshal.FinalReleaseComObject() method is available when using Framework 2.0 and above, and therefore is accessible by Visual Studio versions 2005 and higher. If using the .NET 1.0 or 1.1 Framework (for example, VB.NET or C# 2002/2003) then you have to make use of the Marshal.ReleaseComObject() method instead, calling it in a repeated loop until the reference count reaches zero.

Admittedly, this all looks a bit complicated, for this much effort was not required in VB 6.0 and is certainly not necessary when using VBA. But the issue is that .NET uses non-deterministic Garbage Collection, which is incompatible with COM's reference-counting approach. In short, we need to have .NET tell COM explicitly that we are done with the COM object and there are basically only two ways to do this:

(1) Set the variable = Nothing and then call GC.Collect(). This works 100% fine, but calling GC.Collect() is a time consuming process. You would think, then, that you can set all your variables = Nothing and then call GC.Collect() at the end, and this does work sometimes. However, Microsoft Office applications are sensitive to the order in which the objects are released and, unfortunately, setting your variables = Nothing and then calling GC.Collect() does not guarantee the release order of the objects. (That is, unless you made a separate GC.Collect() call after every single = Nothing setting -- which would be very expensive.)

(2) Call Marshal.FinalReleaseComObject() or Marshal.ReleaseComObject(). This allows us to explicitly control the order in which our COM objects are released. However, this approach does not allow us to release objects to which we do not have a named variable. For example, when programming in Excel, there are usually loops where one iterates through the cells of a Range. It might not be obvious, but the result of this is that many Range objects are created and discarded rapidly. In VBA or VB 6.0, these objects are garbage collected immediately; however, in .NET, they sit, awaiting Garbage Collection.

So the result is that for proper cleanup we must call GC.Collect() and GC.WaitForPendingFinalizers() first in order to release minor objects that we are not referencing by variables we control. In fact need to call GC.Collect() and GC.WaitForPendingFinalizers() twice because the runtime callable wrapper ("RCW") for each COM object has a finalizer that does the actual work of releasing the COM object from .NET. And .NET objects that have finalizers (the RCW in this case) need a second round of garbage collection to be fully removed from memory.

The last step is to go through our named Excel object variables in reverse-order of importance calling Marshal.FinalReleaseComObject() or Marshal.ReleaseComObject(). It is important that the Excel.Application variable gets released last, or else the application's process is likely to hang.

One last caveat is that if you are using event handling via 'WithEvents' variables, things get even more complicated. The reason is that Marshal.ReleaseComObject() and Marshal.FinalReleaseComObject() will not release a COM object that is being referenced via a 'WithEvents' variable. A procedure to handle this is as follows:

Within your Cleanup section, use a temporary variable and set it to the WithEvents variable.

Then set the WithEvents variable to Nothing.

Lastly, call Marshal.ReleaseComObject() or Marshal.FinalReleaseComObject() on the temporary variable.

We can combine these approaches into a single generic routine as follows:

The above looks pretty much the same, except that now we are calling our own custom 'FinalReleaseAnyComObject()' method, which can release COM objects declared 'WithEvents' as well as standard COM objects. Another approach would be to avoid using 'WithEvents' entirely and to instead make use of VB.NET's AddHandler and RemoveHandler keywords for event handling, in which case you should call RemoveHandler before releasing your object.

As you can see, releasing COM objects from .NET is a fairly complex subject. However, you should be able to use the shell provided above, simply filling in the 'Your Code Goes Here' section and possibly tweaking the Cleanup section as needed to accommodate other variables that you'll likely be using.

If you have any questions on this topic or are having trouble getting your MS Office application to release, feel free to post on the .NET Office Automation forum and we'll be glad to help you out.

CType() and Option Strict On

Note that under 'Option Strict On', one may have to make use of the CType() function to cast an object reference to the precise type required. Note the 3rd line in the shell within the first post, which reads:

This line is declaring the oWS variable to be an Excel.Worksheet and then setting it to reference the first Worksheet in oWB. If 'Option Strict' is not being used, then the use of CType() is not necessary. However, the use of 'Option Strict' is very highly recommended, in which case an occasional CType() will be needed in order to tell the compiler the intended return type.

"Your Code Goes Here"

In the example shown in the first post, the block that reads 'Your Code Goes Here' is where you would execute your commands. Typically one is reading and writing values into a spreadsheet, so a very simple example might look something like the following:

The code above will place the words "Hello World" in Cell A1 and then copy it to the Range B2:E5. It then saves the workbook as "MyWorkbook.xls" in the "My Documents" folder. Lastly, it makes certain to release the range references by setting the oRng1 and oRng2 variables = Nothing. Remember that GC.Collect() is called only at the end of the routine in the Cleanup section. Alternatively, one could call Marshal.FinalReleaseComObject() or Marshal.ReleaseComObject() on the oRng1 and oRng2 variables.

Note that the above is also careful to call '.Value' explicitly when needing to get or change the value that is held in a Range. If using VBA or VB 6.0, this would not be required, for the default property of the Range object is in fact the '.Value' property. However, .NET does not recognize any default property that does not take a parameter. For example:

The second line in the above would generate a compile-time error; you cannot set a Range = "Hello". This is because "Hello" is a String (not a Range). The correct approach is to explicitly use the '.Value' property, as in:

Code:

oRng.Value = "Hello"

Contrast this with the Worksheets collection, which does take a parameter for its default .Item() property:

The last line calling oWS = oWB.Worksheets("Sheet1") is valid because the default property of the Worksheets Collection is the .Item() property, which does take a parameter.

To summarize, when using VB.NET, if the property does not require a parameter (as is the case with Range.Value) then the property must be explicitly stated, even if it is the default property of the COM object in question. But if the default member does does have a required parameter -- as is the case with the Worksheets.Item() property -- then that property can be utilized as the default.

Execution Speed

Execution speed is rarely an issue when automating Microsoft Office programs such as Word, Outlook or PowerPoint, but can be for Excel, which is often doing numerous calculations before posting results.

If one does care about execution speed, then be aware that Automation via the COM Interop can be slow. As with VB 6.0 Automation, your .NET Application will be running out-of-process with respect to the Excel Application. That is, your code and the Excel program are not sharing the same address space. This slows down execution speed with respect to native VBA by a factor of about 50x or so in my testing. In addition, the COM Interop adds overhead, so your Automation code may run a bit slower even than it did with VB 6.0 Automation.

With VB 6.0, execution speed can be addressed by compiling to a COM DLL, which runs within the MS Office Application - not within a separate EXE. This allows your code to run in process and therefore run at native VBA's speed, or even slightly faster. Using .NET can only partially take advantage of this technique. One can create a Managed COM DLL, as described here in Creating Office Managed COM Add-Ins with Visual Studio .NET. However your project is still within the .NET Framework -- although the COM Application thinks that your addin is a true COM DLL.The COM application does not know the difference because the .NET Interop is calling (and/or providing callbacks to) methods that utilize the correct COM interfaces of a standard COM DLL. While this approach by Microsoft is impressive, there is still some degradation in execution speed, although it is not noticeable in most circumstances.

Whether using Automation or a Managed COM DLL, one of the key "tricks" to improving execution speed when calling Excel is, instead of looping through the cells of a Range one-by-one, to instead manipulate an entire array set to (or returned by) the Range.Value property in one shot. Here are a couple of threads discussing this approach:

Overall, when automating MS Office programs from a VB.NET application, then your execution speeds should be similar to or perhaps slightly slower than was your speed under VB 6.0 Automation. However, in most cases there will be no noticeable difference and the programmer can benefit greatly by taking advantage of .NET's improved language features.

Deployment Issues

Another issue to consider is deployment. VB.NET can be tricky to implement for MS Office versions below version 11.0 (Office 2003).

For versions 9.0 and below, there is no official support from Microsoft, and it might be best to simply avoid using Office versions below 10.0 when using .NET. However, the intrepid can create a local Interop assembly by using TlbImp.exe and allow for multiple-version compatibility using early binding. There is a good discussion of this here: COM Automation across multiple COM versions (DevCity).

While the examples in this tutorial utilize the Excel object model, the concepts are parallel when automating other MS Office programs such as Word or Access. Even if the methods and properties within those object models are different, the structure for creating your application instance, referencing it, and cleaning up is the same.

I hope this tutorial helps you get started automating your MS Office Applications when using .NET. As always, you can ask any questions on the .NET Office Automation forum.

Note: The following discusses how to improve your Automation code through the use of error handling. While the code below is using Microsoft Excel, the same approach would be valid for other Microsoft Office applications such as Access, Word, etc.

Using Error handling when Automating is a good idea, for without it, any error would cause your program to shut down and leave your MS Office Application instance hanging, running in the background.

This is not the place for a full discussion on Error Handling in .NET, although these links are good primers:

Note that if using the .NET Framework 1.0 or 1.1, then Marshal.FinalReleaseComObject() will not be available. Instead, you would have to use Marshal.ReleaseComObject() within a 'While Loop', as described in the first post.

The 'Catch' section, above, is very simple, merely reporting the error to the user via the command MessageBox.Show(ex.Message). This can be made more sophisticated if you wish, but for the most part any error that occurs in this outer block is likely to be fatal, so it's probably best to merely report the error and then shut Excel down. But of course, you could choose to suppress the message, simply by not passing it along, or "soft reporting" by logging the error to a text file, or even creating your own custom exception class and throwing that along with the inner exception. It's all up to you.

The 'Finally' block is where Excel is closed down properly. The Finally section is always called regardless of whether an error was thrown. That's the beauty of it. That way your code can shut down Excel (preventing a hanging instance) even if an exception was thrown.

Closing

The above routine should handle the majority of your needs and issues when Automating Excel. The same structure can also be adopted for automating other MS Office Applications such as Word or Access. If you have any specific questions, feel free to ask us in the .NET Office Automation Forum.

It's advised that you not give up too soon, for most often there is a reason that can be spotted in your code. If you need help finding what may be causing your MS Office Application to hang, please post your question in the .NET Office Automation forum and we'll be glad help you out...

But if all conventional methods have failed, sometimes there may be no other way than using Process.Kill() to get the job done. However, there is a problem in using this approach: although we can enumerate all the currently running Microsoft Office application processes, if there is more than one running, how do we know which one is ours? Terminating all of them would be a terrible solution, as this would be crashing other versions that the user is currently working on.

The key to knowing "which one" is to make note of the Process.ID when your MS Office application is first created. Then when you are ready to Quit, you can make use of this ID to terminate the correct process.

As a reminder, using Process.Kill() should be a last resort, for if you follow the cleanup procedures properly, Process.Kill() should almost never be necessary. However, just in case ...

Another approach to using Process.Kill(), should you need it, is to use the Application.Hwnd of the Excel.Application instance in order to get the process ID for the running excel application. Then Process.Kill can be called using that process ID. This approach is outlined on StackOverflow here: Killing Excel.EXE on server.

The key to achieving this is by using the GetWindowThreadProcessId Windows API, which can be declared within a class or module as follows:

Then to call it, you could close down your Excel.Application as follows:

Code:

If xlApp IsNot Nothing Then
Dim excelProcessId As Integer
GetWindowThreadProcessId(New IntPtr(xlApp.Hwnd), excelProcessId)
If excelProcessId > 0 Then
Dim ExcelProc As Process = Process.GetProcessById(excelProcessId)
If ExcelProc IsNot Nothing Then ExcelProc.Kill()
End If
End If

Putting this altogether, the following code for a Form1 class could be used to create an Excel Application instance via the btnStartExcel button and then close it via the btnKillExcel button, which utilizes Process.Kill:

(See the attached Visual Studio 2008 project, if you'd like to give it a try.)

But remember, using Process.Kill() should be a last resort. If you follow the cleanup procedures properly, using Process.Kill() should almost never be necessary. If you need assistance cleaning up / shutting down your application instance, feel free to post on the .NET Office Automation forum, where we'll be glad to help you out.