Introduction

When you add the class CHtmlEditor to your project, the users of your software will be able to enter HTML content into an HTML Editor GUI control. Using this class, you get several weeks of coding and testing for free!

Requirements

Have to study the MSDN if you want to expand the existing functionality

Why Such a Comlpex Class?

You might ask, "With MFC7, Microsoft introduced the CHtmlEditView class. Isn't that all I need?"

No, definitely not!

CHtmlEditView is only the very basic base of all. It does not offer any functionality to work with tables. It does not allow you to modify the background color of the document or to modify its STYLE definitions. It does not offer a source editor and it does not allow you to directly access any HTML element inside the page, etc...

Tables

I don't know of any open source HTML editor that supports tables. So I wrote the stuff on my own, which is more complex than you might think,especially if you want to support adding new columns or splitting one cell into two, etc. To keep the work in a reasonable limit, this editor only supports cells that span over multiple columns. Cells spanning over multiple rows are not needed that often, and implementing this would still be a lot of additional work if you do it thoroughly!

Fonts

The Internet Explorer Editor only supports seven font sizes, as you can see in Outlook Express. The command IDM_FONTSIZE sets <FONT size=1...7>. I decided that this is too gross and found a way to let the user set unlimited and precise font sizes using <FONT style="font-size:17px"> (using px is much more precise than using pt).

Internet Explorer COM Interface Relationship

Internet Explorer offers a really huge bunch of COM interfaces, which allow the control of ANYthing. Your application can let Internet Explorer browse to any URL, access any HTML element in the document and read or modify its content. Everything you can do with JavaScript you can also do directly from your application via COM. Also, you cannot only do that if your application hosts the Internet Explorer control, but also if Internet Explorer runs as an independent application! However, remote automation is not a subject of this project.

There is a very long list of COM interfaces described in the MSDN. I will explain here only the basics. The following image tries to illustrate the relationship of the most important interfaces.

The IWebBrowser interface is the browser / editor itself. It contains, for example the command Navigate(), which browses to a URL. You can ask the IWebBrowser interface for the IHTMLDocument interface that represents all the visible content of the HTML document. If the document contains a frameset, then each frame again contains its own IHTMLDocument interface.

You can ask the IHTMLDocument interface for the IHTMLBodyElement which, for example, contains a command to set the background color (<BODY bgcolor=red>). The document can also retrieve ANY other IHTMLElement interface. For example, it can do this by using IHTMLDocument3.getElementById() or by using IHTMLDocument.get_all(), which retrieves a collection of ALL elements in the document that can later be filtered.

You will notice that some interfaces exist with a number. They represent the same interface and you can cast them into each other using CComQIPtr<...>. However, the commands they provide are different. The reason is that these interfaces are implemented at different times:

IHTMLDocument requires Internet Explorer 4.0

IHTMLDocument2 requires Internet Explorer 4.0

IHTMLDocument3 requires Internet Explorer 5.0

IHTMLDocument4 requires Internet Explorer 5.5

IHTMLDocument5 requires Internet Explorer 6.0

With CHtmlEditor::GetMsieVersion(), you can retrieve the version of Internet Explorer on the current machine. If you should need any of the newer functionality, you must check the MSIE version. Otherwise, the result is a crash of your application on older Internet Explorer versions! Currently the project does not use any commands that require more than Microsoft Internet Explorer 5.0.

CHtmlEditor Class Hierarchy

The following image illustrates the class hierarchy:

The hierachy of classes in CHtmlEditor is equivalent to this tree. cHtmlDomNodeis the base of all. It contains commands like those in the JavaScript DOM model: you can navigate from one element to its parent or sibling (cHtmlDomNode::NextSibling()); you can remove an element from the document (cHtmlDomNode::Remove()) or you can even create a new element.

Derived from cHtmlDomNode are cHtmlDocument and cHtmlElement. This means that these inherit the functionality of cHtmlDomNode. cHtmlElement allows, for example, retrieval of the inner HTML code of an element or the modification of its attributes. For example, cHtmlElement::SetAttribute("Align", "Right") would result in <SPAN align=right> when executed on a SPAN element.

Derived from cHtmlElement are the other elements like cHtmlTable, cHtmlImg, etc that again derive the functionality of cHtmlElement and add their own specialized functionality. For example, cHtmlTableCell::Split() would split a table cell into two cells.

Using CHtmlEditor

The fine thing is that, using CHtmlEditor, you don't have to care about all of the COM interfaces. They are all nicely wrapped in C++ classes. First of all, to integrate the editor into your project, simply place a static control into the dialog where you want the editor to be. The following code creates the Internet Explorer editor and the Richedit Source editor:

Working with Styles

CHtmlEditor allows access (read and write) to any style attribute of any HTML element in the document. It is as easy as writing:

i_TableCell.GetStyle().SetProperty(E_FontSize, "18px");

...which would set the font size of a table cell to <TD style="font-size:18px">...</TD>. You can also modify the general style definitions for the whole document. The following sets <HEAD><STYLE> Body { FONT-SIZE: 18px; } </STYLE></HEAD>.

Working with the Selection / Cursor Position

For example, the user has selected text and clicks a toolbar button to execute any action on this text. There are several default functions provided by Internet Explorer that work with the current selection. For example, if you call:

pi_Editor->ExecSetCommand(IDM_FORECOLOR, "red")

...the foreground color of the selected text will be set to red. You can find all the IDM_XYZ commands in the file MsHtmcid.h of Visual Studio 7, but most of them are not implemented. It seems that Microsoft initially had many more plans with Internet Explorer than they finally realized.

But what if you want to implement your own not-yet-existing functionality? You can call cHtmlDocument::GetSelection(), which will return the HTML element containing the cursor or the selection.

Example 1

You want to retrieve the URL of the image which is currently selected by the user:

Example 2

Visual Studio 6 versus 7

There are several reasons why I prefer working with Visual Studio 6 instead of upgrading to Visual Studio 7, but I don't want to explain this here. The problem is that MFC 6 does not yet support CHtmlEditView, the MFC wrapper for the HTML Editor. However, MFC 6 already supports CHtmlView, the Internet Explorer Browser.

If you look into the source code of CHtmlEditView (VisualStudio7\Vc7\AtlMfc\Include\AfxHtml.h), you will notice that it would be extremely awkward to convert all that stuff to make it run on Visual Studio 6. There are several classes required, like CHtmlEditCtrlBase, etc and you cannot simply take Microsoft's Visual Studio 7 code and put it into your Visual Studio 6 project. This is because there are several dependencies on classes which do not yet exist in Visual Studio 6 (CStringA, CStringW) or which have less functionality. You would end up rewriting all of it.

I found an easier way of expanding CHtmlView to get all the functionality I need by adding only a very few lines of code. For Visual Studio 6, it is required to #include some Visual Studio 7 header files and a *.Lib file, which you find in the folder Vs7 of the Visual Studio 6 project. This is the reason why the Visual Studio 6 project download is much bigger. The Visual Studio 7 project obviously uses CHtmlEditView and is much smaller, as all the includes are not required.

Why MFC?

If you are not familiar with MFC and if you are wondering what _T("Red") is good for or what the compiler options UNICODE and MBCS mean, I recommend the VERY good book "Professional MFC," which you can download for free from my homepage. There are some people who don't like MFC, mostly beginners who never understood it. However, this project is a very good example that demonstrates how MFC makes your life much easier! Let's say you have the HTML Editor and want to retrieve the title of the document:

<HTML><HEAD><TITLE>Title of document</TITLE></HEAD><BODY></BODY></HTML>

Version 1

Programming the COM interface of Internet Explorer without MFC would result in this code:

Version 3

Internet Explorer / MFC Bugs

After working for nearly 2 years with the Internet Explorer COM interface, I have to say that this is very good quality: it is free of bugs! This is very unusual for Microsoft products, but the only Internet Explorer bug I found is IHTMLDocument2.get_readyState. Do not use this command! This function worked fine until Microsoft destroyed it with Windows XP SP2. However, this does not matter, as you can easily replace it with CHtmlEditor::GetBusy().

There is also a bug in the MFC function CHtml(Edit)View::GetDocumentHTML() that uses CStreamOnCString, which is buggy. I wrote my own class, cStreamReader, which replaces the buggy function.

Security

Everybody knows that Internet Explorer is full of security holes. However, if you use it in your application just as an editor and do NOT browse with it to any malicious webpage, you don't have to worry about anything!

IMPORTANT: I recommend NOT allowing the user to switch into browse mode. This sample project allows everything.

If you don't want to be that strict, you can use the function CHtmlEditor::OnBeforeNavigate2() to forbid browsing to the internet. There you can put a filter which allows only files on the local hard disk and embedded resources. Each time the user clicks a link, he gets an error message.

<P> versus <DIV>

If you switch Internet Explorer into Design Mode, you will find that by default hitting the Enter key inserts TWO new lines. This is because Internet Explorer inserts a <P> tag. If you want a <BR> (a single new line), you have to press Shift + Enter.

This is quite stupid because you will have to tell all the users of your software to change their habits and to always hold the Shift key down when they want to go to the next line. There is no way to change this behaviour in Internet Explorer. However, there is a work-around. After loading a clean document, you have to insert an empty <DIV> tag like this: <BODY><DIV></DIV></BODY>. Now each Enter inserts ONE new line, which will look like this: <DIV>Text of line 1</DIV><DIV>Text of line 2</DIV>. Also, each empty table cell has to be filled: <TD><DIV></DIV></TD>. CHtmlEditor takes care to do all this automatically.

GUI

This sample project uses a very primitive GUI consisting of buttons and button-style check boxes to keep it simple. You will have to create your own nice toolbar with tooltips. The advantage of a toolbar over check boxes and buttons is that a toolbar cannot steal the focus from the HTML editor when the user clicks a toolbar button. The result could look like this in my program ElmüSoft Desktop Organizer:

I used the top row of toolbar buttons for table editing, the bottom row for text editing and the middle row for all the rest of the functionality.

Automatic GUI Update

Every time you move the cursor in the HTML document, CHtmlEditor posts a notification to its parent so the GUI can be updated. This means that if the cursor moved from a bold text with font size 11 to an underlined text with font size 15, the combo boxes and toolbars are updated. In MFC7, CHtmlEditor gets this event from CHtmlEditView::OnUpdateUI(), but this is not yet available in MFC6. However, after studying the MFC source code, I found that this event is triggered by a timer. I subclass the "Internet Explorer_Server" window, catch this timer and so get the event also using MFC6.

After CHtmlEditor receives this event, it posts a message to its parent. I use WM_IDLEUPDATECMDUI for that, but you can use any other message that is not currently used for other purposes.

MSIE Context Menu

Internet Explorer displays a context menu when you right click into it. This is different in Browse mode and in Design mode. Here on The Code Project, you can find an article about how to modify or turn off this context menu, but this way is extremely complicated.

As I wrote above, I subclass the "Internet Explorer_Server" window and there I catch WM_CONTEXTMENU. There you can easily implement your own context menu via TrackPopupMenuEx() or turn it completely off.

Keyboard Shortcuts

In the same class (CMsieWnd), you can also adapt the way Internet Explorer reacts to keyboard shortcuts. You can either modify the default behaviour (e.g. CTRL-P for printing, CTRL-N to open a new window) or add your own additional shortcuts (e.g. CTRL-R to insert a new table row).

The Cleanup Function

CHtmlEditor contains a very complex HTML cleanup functionality. Why is this? As I already wrote, I made this editor for my program ElmüSoft Desktop Organizer. There the user can enter any HTML content which will later be displayed on the desktop in a note. These desktop notes are displayed by an Internet Explorer Window that is positioned invisibly over the desktop, as you see above.

As this desktop organizer uses Java Script to move and open/close these notes, I cannot allow the user to enter his own JavaScript that might disturb the correct functioning. So there is a cleanup function in CHtmlEditor that removes any <SCRIPT>, <IFRAME>, <OBJECT>, etc blocks the user might have entered.

Additionally there are HTML tags which are not supported by the Internet Explorer editor. For example, <CENTER> centers a whole block of HTML code. However, the editor normally centers only line-by-line by using <DIV align=center>Text</DIV>. If you enter a <CENTER> tag, the editor will display the HTML content correctly, but the GUI button for the centered text will not be pressed and your users will be wondering. There are ONLY two places where illegal code may derive from:

The user entered it in the source editor.

The user copied it from a web page and pasted it into the HTML editor.

However, even if you do NOT paste or source edit, the cleanup function will remove useless empty tags like <u></u>. These may appear after typing, copying & pasting and deleting around for a while.

IMPORTANT: you will have to adapt the function cHtmlDocument::RecursiveCleanUpChilds() to your needs, but do NOT modify anything before you COMPLETELY understand how it works!!!

You could also completely prohibit source editing and pasting, but this is not a good idea. The user will become unable to copy HTML content that he has written on his own and paste it to another location in his document or duplicate it.

Finally

There are lots of other things I should explain here but, you will find all you need to know by studying the source code and its plentiful comments!

IMPORTANT: before you start, study the class DHtmlEditDemoDlg thoroughly to understand how to integrate CHtmlEditor into your project !!!!!!!!!

Did you compile as Unicode ?
What did you do to see this line of code.

Do you have Internet Explorer 7 ? I never tested on MSIE 7.
On Internet Explorer 6 this does not happen.

What you describe is very strange because Internet Explorer internally ALWAYS works with Unicode.
Even on an old Windows 98 it can display chinese characters.
I don't understand why there should be a problem to enter Unicode characters!

Since you have changed the html code generated by IE, you will never see this line in your application.

As you say, Internet Explorer internally ALWAYS works with Unicode, of course I agree with you, becouse COM all ways use unicode. But if it's true, how to explain that in your mthod CHtmlEditor::cStreamReader::SetData(), you transform unicode chars to ansi chars and pass those ansi chars instead of pass unicode chars directly to CHtmlDocument?

There is an information that may be useful that I find html editor can open html file in any charset, such as unicode, utf-8, gb2312 and so on, even when it's different with the charset defined in this html file.

As I already explained a few posts ago, this does NOT happen if you compile the project with the compiler setting UNICODE. You have to compile DHtmlEditDemo_UNICODE.dsw!
Then there is NO conversion.
Unicode characters stay Unicode characters forever.
Even in the source editor you see japanese or chinese text!
Internet Explorer internally always works with Unicode and the demo application does the same.
The conversion is switched off by the compiler switch.

Everything works fine and I dont understand what is your problem.

> Yes, I use VS2005 and set charset to unicode

I doubt that you have the correct settings in your Visual Studio.
I am not talking about a charset !!
UNICODE is a COMPILERSWITCH !

Please look at the code I pasted a few postings ago:#ifdef _UNICODE

> that I find html editor can open html file in any charset, such as unicode, utf-8, gb2312

This META tag has only importance if Internet Explorer opens a FILE from the internet or disk.
If you set the HTML code directly via SetHtml() it shouldn't make a difference as you always set unicode HTML code.

Just for your information:
If you once installed Internet Explorer 7 on any Windows you will NEVER be able to return to Internet Explorer 6 !!!
Microsoft does not want that you UN-install their loved MSIE 7.
Only some files are removed from the system but the "engine" of MSIE 7 (the important DLLs) are NOT removed. So what remains is a MSIE 6 GUI with the MSIE 7 "engine".

But I doubt that your problem has to do anything with the MSIE version.

"I store the Html content into a string, switch the browse/edit mode and re-write the Html content into Internet Explorer. This is to avoid that it forgets everything. (a "workaround" for Alzheimer)
Each time the user switche the mode I have to save the HTML content into a string before switching and restore it after switching."

Can you tell me how should i change my code?
my question is the same to Elmü :when i change source mode to browse mode ,it is blank! thank you a lot,in advance!

Great job. Thanks for the editor. But is it possible to add support of table column selection? Or just multiple discontinuously cells selection like in different editors (VS 2005, FrontPage, etc)? Thanks in advance.

You are using the Internet Explorer Save command.
This behaves absolutely correct in asking the user where to save the file.

If you don't want the OpenFileDialog window to ask the user where to save the file
simply get the HTML code with GetHtml() and save the content of the string
for example by using the Windows API CreateFile() and WriteFile()

Very nice work, thank you!
I only have one problem but as I read somewhere it is problem of the built-in WebBrowser control: When content of the browser is loaded dynamically (e.g. from a string) and user is not in design mode (it's called browse mode if I'm not mistaken), any link like this "file://some_file" stops working. Code doesn't even step into OnBeforeNavigate2 method. Http, mailto, etc. links work fine.

Strangely enough, if the path is incorrectly formatted, the browser shows a messagebox which complains not finding the file (correctly formatted path pointing to an invalid file doesn't produces the messagebox)

Do you by any chance have a solution for this problem?
Thank you!
Chris

I need to load html pages in this editor asynchronously so users don't need to wait while html is loading.
I create thread and execute following method there: CHTMLEditor::GetDocument()::SetHtml("some html")
but I receive access violation in the method SetHtml()!!
How to do this right? And is it possible to do such thing with HtmlEditor?

Are your pages very very huge so loading takes long ?
Otherwise a thread is nonsense!

I recommend not to use another thread.
Instead use SetTimer(..) in the GUI thread and in the handler of WM_TIMER load the HTML data.
Set the timer to 100 ms in OnInitDialog() and the user is not blocked if you load very very huge pages.

hi
Thanks for greate article,
can you help about doing like your app in C#.
I want to customize(bgcolor,border color...)tables that I inserted, add or delete rows,columns... like you.
but I couldnt find the position of the element in which the cursor is blinking.(GetSelection ...)
I work with visual studio 2005,

I want to save html whithout save as dialog.
I searched the net all the way through, but I don't found any solution.
I find your program "ElmüSoft Desktop Organizer" solve the problem, just like "Desktop Notes Settings" tab window.

I am using this HTML editor and it generates memory leaks. It seams that something is executed in an endless loop and each time it allocates a memory block with similar size of the viewed HTML document and of course, it is never freed. I have reviewed the code several times and this issue isn't related to any timer event or any windows message. It seams to be something related the MSHTML.dll, because if I do a memory leak check with the code from:http://www.codeproject.com/tools/leakfinder.asp[^]
then the following block is repeating over and over:

MsHtml.dll is part of Internet Explorer.
If you say that this DLL has a memory leak you say that Internet Explorer has a memory leak.
Both is not true.

I use my editor since many years and have no problems.
I also did never see that Internet Explorer uses an increasing amount of memory while using it.

The most probable is that your code has a bug or that your tool measures something strange.

Did you check your application in taskmanager ?
If you REALLY have a memory leak you will see in Taskmanager that the memory consumption increases up to an amount of for example 500 MB and the computer becomes more and more slower then.
Do you really have this effect? (I don't beleave!)

Does it consume more and more memory while you are using it?
How much?
Up to which mem usage does it increase?

When you close the Internet Explorer Editor's parent window, is the memory released ?

Did you ever consider that Internet Explorer might use something like an garbage collector which automatically frees the memory after a while or at least after closing the window ?

If you ever observed modern software (written in Java or C#) you will notice an increasing mem usage up to for example 50MB and then the collector starts freeing memory and the usage decreases to for example 20 MB, then increases up to 50MB and the garbage collector becomes active again.

The information of your leakfinder does not automatically mean that there is really a memory leak!
Observe your application in Taskmanager and you will notice that there is no memory leak!