Browser Helper Objects: The Browser the Way You Want It

As of December 2011, this topic has been archived. As a result, it is no longer actively maintained. For more information, see Archived Content. For information, recommendations, and guidance regarding the current version of Internet Explorer, see IE Developer Center.

Introduction

There are sometimes circumstances in which you need a more or less specialized version of the browser. Sometimes you work around this by developing a completely custom module built on top of the WebBrowser control, complete with buttons, labels, and whatever else the user interface requires. In this case, you're free to add to that browser any new, nonstandard feature you want. But what you actually have is just a new, nonstandard browser. The WebBrowser control is just the parsing engine of the browser. This means there still remains a number of UI-related tasks for you to do: adding an address bar, toolbar, history, status bar, channels, and favorites, just to name a few. So, to create a custom browser you have to write two types of code: the code that transforms the WebBrowser control into a full-fledged browser like Microsoft Internet Explorer, and the code that implements the new features you want it to support. Wouldn't it be nice if there was a straightforward way to customize Internet Explorer instead? Browser Helper Objects (BHO) do just that.

Program Customization

Historically speaking, the first way to customize the behavior of a program was through subclassing. By this means, you could change the way a given window in a program processed messages and actually obtain a different behavior. Although considered a brute-force approach, because the victim is largely unaware of what happens, it's been the only choice for a long time.

With the advent of the Microsoft Win32 API, interprocess subclassing was discouraged and made a bit harder to code. If you're brave-hearted, however, pointers have never scared you; above all, if you're used to living in symbiosis with system-wide hooks, you might even find it too simple. But this is not always the case. Despite the cleverness of the programming, the point is that each Win32 process runs in its own address space and breaking the process boundaries is somewhat incorrect. On the other hand, there might be circumstances that require you to do this with the best of intentions. More often, customization might be a specific feature the program itself allows by design.

In the latter case, the programs search for additional modules in well-known and prefixed disk zones, load, initialize, and then leave them free to do the job they have been designed to do. This is exactly what happens with the Internet Explorer browser and its helper objects.

What Are Browser Helper Objects?

From this point of view, Internet Explorer is just like any other Win32-based program with its own memory space to preserve. With Browser Helper Objects you can write components—specifically, in-process Component Object Model (COM) components—that Internet Explorer will load each time it starts up. Such objects run in the same memory context as the browser and can perform any action on the available windows and modules. For example, a BHO could detect the browser's typical events, such as GoBack, GoForward, and DocumentComplete; access the browser's menu and toolbar and make changes; create windows to display additional information on the currently viewed page; and install hooks to monitor messages and actions.

Before going any further with the nitty-gritty details of BHO, there are a couple of points I need to illuminate further. First, the BHO is tied to the browser's main window. In practice, this means a new instance of the object is created as soon as a new browser window is created. Any instance of the BHO lives and dies with the browser's instance. Second, BHOs only exist in Internet Explorer, version 4.0 and later.

If you're running the Microsoft Windows 98, Windows 2000, Windows 95, or Windows NT version 4.0 operating system with the Active Desktop Shell Update (shell version 4.71), BHOs are supported also by Windows Explorer. This has some implications that I'll talk more about later when making performance considerations and evaluating the impact of BHOs.

In its simplest form, a BHO is a COM in-process server registered under a certain registry's key. Upon startup, Internet Explorer looks up that key and loads all the objects whose CLSID is stored there. The browser initializes the object and asks it for a certain interface. If that interface is found, Internet Explorer uses the methods provided to pass its IUnknown pointer down to the helper object. This process is illustrated in Figure 1.

Figure 1. How Internet Explorer loads and initializes helper objects. The BHO site is the COM interface used to establish a communication.

The browser may find a list of CLSIDs in the registry and create an in-process instance of each. As a result, such objects are loaded in the browser's context and can operate as if they were native components. Due to the COM-based nature of Internet Explorer, however, being loaded inside the process space doesn't help that much. Put another way, it's true that the BHO can do a number of potentially useful things, like subclassing constituent windows or installing thread-local hooks, but it is definitely left out from the browser's core activity. To hook on the browser's events or to automate it, the helper object needs to establish a privileged and COM-based channel of communication. For this reason, the BHO should implement an interface called IObjectWithSite. By means of IObjectWithSite, in fact, Internet Explorer will pass a pointer to its IUnknown interface. The BHO can, in turn, store it and query for more specific interfaces, such as IWebBrowser2, IDispatch, and IConnectionPointContainer.

Another way to look at BHOs is in terms of Internet Explorer shell extensions. As you know, a Windows shell extension is a COM in-process server that Windows Explorer loads when it is about to perform a certain action on a document—for example, displaying its context menu. By writing a COM module that implements a few COM interfaces, you're given a chance to add new items to the context menu and then handle them properly. A shell extension must also be registered in such a way that Windows Explorer can find it. A Browser Helper Object follows the same pattern—the only changes are the interfaces to implement. Slightly different is the trigger that causes a BHO to be loaded. Despite the implementation differences, however, shell extensions and BHOs share a common nature, as the following table demonstrates.

Table 1. How Shell Extensions and Browser Helper Objects Implement Common Features

User's action on a document of a certain class (that is, right-click).

Opening of the browser's window.

Unloaded when

A few seconds later the reference count goes to 0.

The browser window that caused it to load gets closed.

Implemented as

COM in-process DLL.

COM in-process DLL

Registration requirements

Usual entries for a COM server plus other entries, depending on the type of shell extension and the document type that it will apply to.

Usual entries for a COM server plus one entry to qualify it as a BHO.

Interfaces needed

Depends on the type of the shell extension.

IObjectWithSite.

If you're interested in shell extensions, see the MSDN Library Online or CD documentation for a primer. For deeper coverage, check out my recently published book, Professional Shell Programming for Windows (Wrox Press, 1-861001-84-3).

The Lifecycle of Helper Objects

As I mentioned earlier, BHOs aren't just supported by Internet Explorer. Provided you're running at least shell version 4.71, your BHOs will also be loaded by Windows Explorer—meaning that a unique browser can navigate both the Web and local disks with a similar user experience. The next table provides a product-oriented view of the various shell versions available today. The shell version number depends on the version information stored in shell32.dll.

Table 2. Browser Helper Objects Support for the Various Shell Versions

Shell version

Installed products

BHOs supported by

4.00

Windows 95 and Windows NT 4.0 with or without Internet Explorer 4.0 or earlier.

A Browser Helper Object is loaded when the main window of the browser is about to be displayed and is unloaded when that window is destroyed. If you open more copies of the browser window, more instances of the BHO will be created. The BHO is loaded despite the command line that launches the browser. For example, it gets loaded even if you simply want to see only a specific HTML page or a given folder. In general, the BHO is taken into account when either explorer.exe or iexplore.exe execute. If you set the "Open each folder in its own window" folder setting, the BHO will load each time you open a folder.

Figure 2. With this setting, each time you open a folder, a separate instance of explorer.exe executes and loads the registered BHOs.

Notice, however, that this applies only when you open folders starting from the My Computer icon on the desktop. In this case, the shell calls explorer.exe each time you move to another folder. The same won't occur if you start browsing from a two-paned view. In fact, when you change the folder the shell doesn't launch a new instance of the browser but simply creates another instance of the embedded view object. Curiously, if you change the folder by typing a new name in the Address bar, the browsing always takes place in the same window whether Window Explorer's view is single or two-paned.

Things are far simpler with Internet Explorer. You have multiple copies of it only if you explicitly run iexplore.exe multiple times. When you open new windows from Internet Explorer, each window is duplicated in a new thread without originating a new process, and therefore without reloading BHOs.

Above all, the most interesting feature of BHOs is that they are extremely dynamic. Each time Window Explorer's or Internet Explorer's window is opened, the loader reads the CLSID of the installed helper objects from the registry and deals with them. You can have different BHOs loaded by different copies of the browser if you edited the registry between instances of opening the browser. This means that now you have an excellent alternative to writing a new browser from scratch—you can embed WebBrowser in a Microsoft Visual Basic or Microsoft Foundation Classes (MFC) frame window. At the same time, you're given a great opportunity to arrange very extensible browsing applications. You can rely on the full power of Internet Explorer and add as many add-ons as you want when it suits your needs.

The IObjectWithSite Interface

From this high-level overview of Browser Helper Objects one concept emerges clearly: A BHO is a dynamic-link library (DLL) capable of attaching itself to any new instance of Internet Explorer and, under certain circumstances, also Windows Explorer. Such a module can get in touch with the browser through the container's site.

In general, a site is an intermediate object placed in the middle of the container and each contained object. Through it, the container manages the content of the contained object and, in return, makes the object's internal functionality available. The site-based relationship between containers and objects involves the implementation of interfaces like IOleClientSite on the container side, and IOleObject on the object side. By calling methods on IOleObject, the container makes the object aware of its host environment.

When the container is Internet Explorer (or the Web-enabled version of Windows Explorer), performance issues reduce this communication pattern to the essential. The object is now required to implement a simpler and lighter interface called IObjectWithSite. It provides just two methods.

Table 3. The IObjectWithSite Interface Definition

Method

Description

HRESULT SetSite(

IUnknown* pUnkSite)

Receives the IUnknown pointer of the browser. The typical implementation will simply store such a pointer for further use.

HRESULT GetSite(

REFIID riid,

void** ppvSite)

Retrieves and returns the specified interface from the last site set through SetSite(). The typical implementation will query the previously stored pUnkSite pointer for the specified interface.

The only strict requirement for a BHO is implementing this interface. Notice that you should avoid returning E_NOTIMPL from any of the preceding functions. Either you don't implement the interface or you should be able to code its methods properly.

Writing a Browser Helper Object

A Browser Helper Object is a COM in-process server, so what's better than the Active Template Library (ATL) to build one? Another reason for choosing ATL is that it already provides a default and good enough implementation of the IObjectWithSite interface. Plus, among the predefined types of objects that the ATL COM Wizard natively supports, there's one, the Internet Explorer Object, that is just the type of object a BHO should be. An ATL Internet Explorer Object, in fact, is a simple object—that is, a COM server that supports IUnknown and self-registration—plus IObjectWithSite. If you add such an object to your ATL project, and call the corresponding class CViewSource, you get the following code from the wizard:

As you can see, the wizard already makes the class inherit from IObjectWithSiteImpl, which is an ATL template class that provides a basic implementation of IObjectWithSite. (See atlcom.h in the ATL\INCLUDE directory of Microsoft Visual Studio 98.) Usually there's no need to override the GetSite() member function. Instead, the coded behavior of SetSite() often, if not always, needs customization. ATL, in fact, simply stores the IUnknown pointer to a member variable called m_spUnkSite.

Throughout the remainder of the article I'll discuss a quite complex and rich example of BHO. The object will attach itself to Internet Explorer only, and show a text box with the source code of the page being viewed. This code window will be automatically updated when you change the page and grayed out if the document that Internet Explorer is displaying is not an HTML page. Any change you apply to the raw HTML code is immediately reflected in the browser. This kind of magic is made possible by dynamic HTML (DHTML). Such a code window can be hidden and then recalled through a hot key. When visible, it shares the whole desktop work area with Internet Explorer, resizing properly as shown in Figure3.

Figure 3. The Browser Helper Object in action. It attaches to Internet Explorer and shows the source code of the page being viewed. It also allows you to enter (but not save) changes.

The key point with this example is accessing Internet Explorer's browsing machinery, which is nothing more than an instance of the WebBrowser control. This example can be broken out into five main steps:

Detecting who's loading the object, be it Internet Explorer or Windows Explorer.

Getting the IWebBrowser2 interface that renders the WebBrowser object.

Catching the WebBrowser's specific events.

Accessing the document being viewed, making sure it is an HTML document.

Managing the dialog box window with the HTML source code.

The first step is accomplished in the DllMain() code. SetSite(), instead, is the right place to get the pointer to the WebBrowser object. Let's look at all these steps in a bit more detail. For information on what isn't covered here you can refer to the source code available on the MSDN Online Web site.

Detecting Who's Calling

As mentioned earlier, a BHO can be called either by Internet Explorer or Windows Explorer if you're running at least shell version 4.71. In this case, I'm designing a helper object specifically targeted to work with HTML pages, so it will have nothing to do with Windows Explorer. A DLL that doesn't want to be loaded by a certain caller can simply return False in its DllMain() function once it detects who's calling. The GetModuleFileName() API function returns the name of the caller module if you pass NULL as its first argument. Such a parameter is the handle of the module whose name you want to know. NULL means that you want the name of the calling process.

Once you know the name of the process, you can quit loading if it is Windows Explorer. Notice that a more selective choice might be dangerous. In fact, other processes could try to load the DLL for legitimate reasons and be rejected. The first victim of this situation is regsvr32.exe, the program used to automatically register the object. If you make a different test, say, only against the Internet Explorer executable:

if (!_tcsstr(pszLoader, _T("iexplore.exe")))

you won't be able to register the DLL any longer. In fact, when regsvr32.exe attempts to load the DLL to invoke the DllRegisterServer() function, the call will be rejected.

Get in Touch with WebBrowser

The SetSite() method is where the BHO is initialized and where you would perform all the tasks that happen only once. When you navigate to a URL with Internet Explorer, you should wait for a couple of events to make sure the required document has been completely downloaded and then initialized. Only at this point can you safely access its content through the exposed object model, if any. This means you need to acquire a couple of pointers. The first one is the pointer to IWebBrowser2, the interface that renders the WebBrowser object. The second pointer relates to events. This module must register as an event listener with the browser in order to receive the notification of downloads and document-specific events. By making use of ATL smart pointers:

To get a pointer to the IWebBrowser2 interface, you simply query it. The same occurs for IConnectionPointContainer, the first step for event handling. The code for SetSite() also retrieves the HWND of the browser and installs a keyboard hook on the current thread. The HWND will be used later to move and resize the Internet Explorer window. The hook, instead, serves the purpose of providing a hot key to make the code window appear and disappear at the user's leisure.

Getting Events from the Browser

When you navigate to a new URL, the browser needs to primarily accomplish two things: download the referred document and prepare the host environment for it. In other words, it must initialize and make externally available an object model for it. Depending on the type of document, this means either loading a Microsoft ActiveX server application registered to handle that document (for example, Microsoft Word for .doc files) or initializing some internal components that analyze the document content and fill the elements of the object model that renders it. This is what happens with HTML pages whose content is made available through the DHTML object model. When the document has been completely downloaded, a DownloadComplete event is fired. This does not necessarily mean that it's safe to manage the document's content through its object model. Instead, a DocumentComplete event indicates that everything has been done and the document is ready. (Notice that DocumentComplete arrives only the first time you access the URL. Subsequently, if you press F5 or click the Refresh button, you'll receive only a DownloadComplete event.)

To intercept the events fired by the browser, the BHO needs to connect to it via an IConnectionPoint interface and pass the IDispatch table of the functions that will handle the various events. The pointer to IConnectionPointContainer obtained previously is used to call the FindConnectionPoint method that returns a pointer to the connection point object for the required outgoing interface: in this case, DIID_DWebBrowserEvents2. The following code shows how the connection takes place:

By calling the IConnectionPoint's Advise() method, the BHO lets the browser know that it is interested in receiving notifications about events. Due to the COM event-handling mechanism, all this actually means that the BHO provides the browser with a pointer to its IDispatch interface. The browser will then call back the IDispatch's Invoke() method, passing the ID of the event as the first argument.

It's important to remember to disconnect from the browser when events are no longer needed. If you forget to do this, the BHO will remain locked even after you close the browser's window. (Among other things, this means you can't recompile or delete the object.) A good time to disconnect is when you receive the OnQuit event.

Accessing the Document Object

At this point the BHO has a reference to Internet Explorer's WebBrowser control and is connected to the browser for receiving all the events it generates. When the Web page is completely downloaded and properly initialized, it's finally possible to access it through the DHTML document object model. The Document property of WebBrowser returns a pointer to the IDispatch interface of the document object:

What the get_Document() method provides is just a pointer to an interface. We need to make sure that behind that IDispatch pointer there's really an HTML document object. If I were using Visual Basic, the following would have been equivalent code:

Dim doc As Object
Set doc = WebBrowser1.Document
If TypeName(doc)="HTMLDocument" Then
' Get the document content and display
Else
' Disable the display dialog
End If

What's needed is a way to learn about the nature of the IDispatch pointer returned by get_Document(). Internet Explorer is more than an HTML browser and is capable of hosting any ActiveX document—that is, any document for which an application exists that acts as an ActiveX document server. Given this, there's no guarantee the document viewed is really an HTML page.

One solution is to look at the location URL and check the URL's extension. But what about Active Server Pages (ASP) or a URL where the HTML page is implicit? And what if you're using custom protocols like about or res? (For more information about custom protocols, check out my Cutting Edge column in the January 1999 issue of MIND magazine.)

I decided to take another approach, much more akin to the Visual Basic code just shown. The idea is, if the IDispatch pointer actually refers to an HTML document, querying for the IHTMLDocument2 interface would be successful. IHTMLDocument2 is the interface that wraps up all the functionality that the DHTML object model exposes for an HTML page. The following code snippet shows how to proceed:

The spHTML pointer is NULL if the query interface for IHTMLDocument2 failed. Otherwise, we're fine with the methods and properties of the DHTML object model.

Now the problem becomes how to get the source code of the displayed page. Fortunately, to work around this a rudimentary knowledge of DHTML will suffice. Just as an HTML page encloses all its content into a <BODY> tag, the DHTML object model requires you to get a pointer to the Body object as the first step:

CComPtr<IHTMLElement> m_pBody;
hr = spHTML->get_body(&m_pBody);

Curiously, the DHTML object model doesn't let you know about the raw content of the tags that precede <BODY>, such as <HEAD>. Their content is processed and then stored in a number of properties, but you still don't have one returning the raw text contained in the original HTML file. What the body can tell, however, will suffice here. To get the HTML code included in the <BODY>…</BODY> tags I need to read the content of the outerHTML property into a BSTR variable:

BSTR bstrHTMLText;
hr = m_pBody->get_outerHTML(&bstrHTMLText);

At this point, displaying the text into the code window is a matter of creating the window, converting the string from Unicode to ANSI, and setting the edit box, as shown in Figure 3. The following shows the full code for this:

Because I run this code in response to the DocumentComplete notification, each new page is automatically and promptly processed. The DHTML object model lets you modify on the fly the structure of the page, but all the changes will be lost as soon as you refresh the view by hitting F5 or clicking the browser's Refresh button. By also handling the DownloadComplete event you can refresh the code window as well. (Pay attention to the fact that the DownloadComplete event comes before DocumentComplete.) So, you should ignore the DownloadComplete generated by the first download of the page and consider it only when it originates from a refresh. A simple Boolean member, for example m_bDocumentCompleted, is of great help in distinguishing between the situations.

Managing the Code Window

The code window used to show the HTML source code of the current page is another ATL basic element—a dialog box window that you find in the Miscellaneous page of the ATL Object Wizard. I resize this window in response to the WM_INITDIALOG message and make it occupy the lowest portion of the desktop work area—that is, the available screen minus the taskbar, wherever it is docked.

This window may or may not appear at the browser startup. By default it does, but this can be prevented by clearing the Show window at startup check box. You can also close the window if you like. By pressing F12, however, you can bring it back at any time. F12 is caught by the keyboard hook I installed in SetSite().

The startup setting is saved to the registry in full accordance with Microsoft guidelines. To read and write the registry I employed the new Shell Lightweight API (shlwapi.dll) instead of the Win32 functions, saving the hassle of opening and closing the involved keys:

This DLL has been introduced with Internet Explorer 4.0 and Active Desktop, and is a standard system component beginning with Windows 98. Such functions are more direct than the corresponding Win32 functions and are preferred for single reading and writing.

Registration of Helper Objects

A BHO is a COM server and should be registered both as a COM server and as a BHO. The ATL Wizard provides you with the necessary registrar script code (RGS) that accomplishes the first task. What follows is the RGS code that properly installs a helper object. (The CLSID comes from the example.)

Note the ForceRemove clause that causes the key to be removed when you unregister the object.

Under the Browser Helper Objects key fall all the installed helper objects. Such a list is never cached by the browser, so installing and testing BHOs is really a quick matter.

Summary

In this article, I presented Browser Helper Objects—a relatively new and powerful way of injecting your code directly inside the browser's address space. What you have to do is write a COM server that supports the IObjectWithSite interface. At this point, your module is for all legal purposes a component of the browser machinery. The sample I've built throughout the article also touched on topics such as COM events, the dynamic HTML object model, and the WebBrowser programming interface, which may appear to be a little off the topic. Instead, I think this demonstrates the power of BHOs, and at the same time provides a real-world platform on which to build your own objects. If you need to know what the browser is displaying, you absolutely need to sink events and become familiar with WebBrowser. Now you know: forewarned is forearmed. To conclude, let me also remind you that BHOs are useful with Windows Explorer as well and, thanks to WebBrowser, they can be driven from your code.