MSHTML Hosting – IDocHostUIHandler

We can’t discuss embedding WebBrowser in an application without also discussing IDocHostUIHandler. The IDocHostUIHandler (and IDocHostShowUI) interface is the standard method provided by Microsoft for customizing how WebBrowser works when hosted in an application. Some of the things you can use IDocHostUIHandler to do include:

  • Disable standard right-click, context menu (or provide your own version)
  • Turn off the 3D border and scrollbars
  • Give the WebBrowser scripting engine access to your special purpose COM methods
  • Handle accelerator keys and URL’s before MSHTML gets them

IDocHostUIHandler is an interface that you need to implement. How you implement depends on your development language and environment. For most languages, it means deriving a class from IDocHostUIHandler, adding code to the methods you want and returning safe, default values from those you do not want.

The next step is giving MSHTML your implementation of IDocHostUIHandler. Again, this depends on your development language and environment. The easy way is the “ICustomDoc” method. The best way is the “IOleClientSite” method. There is a nice C# article on CodeProject that discusses both methods (as well as some other good WebBrowser information).

ICustomDoc

This method requires only IDocHostUIHandler to be implemented which keeps things simpler. However, you have to give the interface to MSHTML after each page is loaded. The best way to do it is in the OnDocumentComplete or OnNavigationComplete events. This is the method I currently use in my applications. Code looks something like this:


IHTMLDocument2* pDoc = ...;
ICustomDoc* pCustom = 0;
hr = pDoc->QueryInterface(IID_ICustomDoc, (void**)&pCustom);
if (SUCCEEDED(hr)) {
  pCustom->SetUIHandler(pMyDocHostUIHandler);
  pCustom->Release();
}

IOleClientSite

This method requires that you implement IOleClientSite, as well as IDocHostUIHandler. The benefit is that you only need to give the interfaces to MSHTML once, before any HTML is loaded. I have not used this method in my applications, so I have no code to show you. I am planning on using it soon. I’ll update this post when I am finished.

I am switching from ICustomDoc to IOleClientSite mainly because of a transient problem. When the WebBrowser control loads my custom HTML, it waits until it is completely loaded before getting my IDocHostUIHandler, so there is a split second where the borders and scrollbars are not displaying correctly. With IOleClientSite, WebBrowser asks for my IDocHostUIHandler before it loads the HTML.

MSHTML Hosting – Odds & Ends

In this post I wanted to cover some miscellaneous things you may want to do with your embedded WebBrowser. On its own, the IWebBrowser2 interface does not support doing much more than we already covered in previous posts. However, if you start
using the MSHTML DOM interfaces, much more functionality is available. Here is a list of simple things you can implement without too much difficulty:

  • Retrieving HTML from the WebBrowser.
  • Retrieving the HTML of the current selection.
  • Finding text in the HTML and selecting it.
  • Creating an image of the current HTML.

Retrieving HTML from the WebBrowser

There are times when you might want to get the currently loaded HTML from the control. You may want to save it to a file or parse it for information. For this functionality, you have to use the IPersistXxx interfaces. These are the same we used to load HTML into the WebBrowser from memory. The same works in reverse:


IHTMLDocument2* pDoc = ...;
IStream* pMyStream = ...;

IPersistStreamInit* pPersist = 0;
HRESULT hr = pDoc->QueryInterface(IID_IPersistStreamInit, (void**)&pPersist);
if (SUCCEEDED(hr) && pPersist) {
    hr = pPersist->Save(pMyStream, true);
    pPersist->Release();
}

Retrieving the HTML of the current selection

If you want to limit the HTML to just what a user has selected, instead of the entire document, we can use the IHTMLXxx COM interfaces. The first thing you need to do is get access to the IHTMLDocument interface for the current document. IWebBrowser2 gives you access using it’s Document property. The Document property returns an IDispatch interface, so we need to QueryInterface the IDispatch interface for an IHTMLDocument interface, like so (raw C++):


IDispatch* pDocDisp = 0;
HRESULT hr = pWebBrowser->get_Document(&pDocDisp);

IHTMLDocument2* pDoc = 0;
hr = pDocDisp->QueryInterface(IID_IHTMLDocument2, (void**)&pDoc);
if (SUCCEEDED(hr)) {

    //...

    pDoc->Release();
}

pDocDisp->Release();

The IHTMLXxx interfaces follow the W3C DOM specification used for JavaScript very closely. If your familiar with those objects, the IHTMLXxx interface will be easy to grasp. In fact, if you know how to do something using JavaScript, you can duplicate it your compiled code using the IHTMLXxx interfaces.

That said, you can get the current selection as a IHTMLTxtRange from the document element. Once you have a text range, you can retrieve the plain text or HTML text as shown below:


IHTMLDocument2* pDoc = ...;

IHTMLSelectionObject* pSelection = 0;
HRESULT hr = pDoc->get_selection(&pSelection);
if (SUCCEEDED(hr)) {
   IDispatch* pDispRange = 0;
   hr = pSelection->createRange(&pDispRange);
   if (SUCCEEDED(hr)) {
      IHTMLTxtRange* pTextRange = 0;
      hr = pDispRange->QueryInterface(IID_IHTMLTxtRange, (void**)&pTextRange);
      if (SUCCEEDED(hr)) {
         CComBSTR sText;
         pTextRange->get_text(&sText);
         // or
         pTextRange->get_htmlText(&sText);
         //...
         pTextRange->Release();
      }
      pDispRange->Release();
   }
   pSelection->Release();
}

pDoc->Release();

Finding text in the HTML and selecting it

The Google toolbar in IE does this to make it easy to spot keywords found in the page. We are using body and text range objects. This time we are making a IHTMLTxtRange object, not getting the current selection. IHTMLTxtRange has find and select methods that make this task easy. Be sure to check out the parameters for IHTMLTxtRange::findText as they can be used to modify how the text is searched:


IHTMLDocument2* pDoc = ...;
IHTMLElement* pBodyElem = 0;
HRESULT hr = pDoc->get_body(&pBodyElem);
if (SUCCEEDED(hr)) {
   IHTMLBodyElement* pBody = 0;
   hr = pBodyElem->QueryInterface(IID_IHTMLBodyElement, (void**)&pBody);
   if (SUCCEEDED(hr)) {
      IHTMLTxtRange* pTextRange = 0;
      hr = pBody->createTextRange(&pTextRange);
      if (SUCCEEDED(hr)) {
         CComBSTR sText = "findme";
         VARIANT_BOOL bSuccess;
         hr = pTextRange->findText(sText, 0, 0, &bSuccess);
         if (SUCCEEDED(hr) && bSuccess == VARIANT_TRUE)
            pTextRange->select();
         pTextRange->Release();
      }
      pBody->Release();
   }
   pBodyElem->Release();
}

pDoc->Release();

Creating an image of the current HTML

Turning the contents of the WebBrowser into an image is not as straight forward as you may expect. Looking at the IHTMLXxx interfaces does turn up an IHTMLElementRenderer interface. IHTMLElementRenderer contains:

IHTMLElementRender::DrawToDC(HDC hDC);

You can try to use this method, but I have found that it is not very reliable and reacts inconsistently depending on the type of HDC you give it. A more reliable method uses an older OLE method. IViewObject supports the ability to render to an HDC. The IWebBrowser2::Document property can be QueryInterfaced for IViewObject. Two things to note while using this method, (1) you will probably want to turn off the scrollbars and 3D border since they will show up in the image and (2) you will want to resize the WebBrowser to the size of the contained HTML if you want to capture the entire content in the image. You may want to only make these changes temporarily and change them back after the image is captured:


IHTMLDocument2* pDoc = ...;
IHTMLElement* pBodyElem = 0;
HRESULT hr = pDoc->get_body(&pBodyElem);
if (SUCCEEDED(hr)) {
   IHTMLBodyElement* pBody = 0;
   hr = pBodyElem->QueryInterface(IID_IHTMLBodyElement, (void**)&pBody);
   if (SUCCEEDED(hr)) {
      // hide 3D border
      IHTMLStyle* pStyle;
      hr = pBodyElem->get_style(&pStyle);
      if (SUCCEEDED(hr)) {
         pStyle->put_borderStyle(CComBSTR("none"));
         pStyle->Release();
      }

      // hide scrollbars
      pBodyElement->put_scroll(CComBSTR("no"));

      // resize the browser component to the size of the HTML content
      IHTMLElement2* pBodyElement2;
      hr = Body->QueryInterface(IID_IHTMLElement2, (void**)&BodyElement2)
      if (SUCCEEDED(hr)) {
         long iScrollWidth = 0;
         pBodyElement2->get_scrollWidth(&iScrollWidth);

         long iScrollHeight = 0;
         pBodyElement2->get_scrollHeight(&iScrollHeight);

         // these lines depend on your WebBrowser wrapper
         pWebBrowser->SetWidth(iScrollWidth);
         pWebBrowser->SetHeight(iScrollHeight);

         pBodyElement2->Release();

         IViewObject* pViewObject;
         pDoc->QueryInterface(IID_IViewObject, (void**)&pViewObject);
         if (pViewObject) {
            /* however you want to make your image HDC.
               You can size it using iScrollHeight & iScrollWidth */
            HDC hImageDC = ... // could be bitmap or enhanced metafile
            HDC hScreenDC = ::GetDC(0);
            RECT rcSource = {0, 0, iScrollWidth, iScrollHeight};
            hr = pViewObject->Draw(DVASPECT_CONTENT, 1, NULL, NULL,
                                   hScreenDC, hImageDC, rcSource,
                                   NULL, NULL, 0);
            ::ReleaseDC(0, hScreenDC);
            pViewObject->Release();
         }
      }
      pBody->Release();
   }
   pBodyElem->Release();
}

pDoc->Release();

As you can see, there is a lot of things you can do using the MSHTML object model. Some of it can be tricky. Other things just aren’t supported as well as they should be for an application developer. I guess you could say that application developers have their own list of issues for IE.

MSHTML Hosting – Mozilla

Before moving ahead with the MSHTML hosting posts, I wanted to take a moment to talk about alternatives to WebBrowser. I am sure you are familiar with the Mozilla webbrowser. Started as an open source project by Netscape, Mozilla and its suite of companion projects are quite an achievement. One of the Mozilla projects is an ActiveX wrapper around the rendering engine which conforms to the IWebBrowser2 and DWebBrowserEvents2 COM interfaces. Created and maintained by Adam Lock the project has come along way. There is even minimal support for the IHTMLDocument DOM interfaces.

Everything we have covered in previous posts regarding WebBrowser functionality can be implemented using the Mozilla control as well. With no code changes! Just embed the Mozilla control and your application will not be dependent on Microsoft. Some of us like having choices.

My biggest problem with the Mozilla control is that the project is not moving fast enough to implement more functionality. Honestly, if Mozilla wants to grab more market share, it should be putting more resources on the project. One or two guys is not enough. I know that some people would tell me to just use the native Mozilla C++ classes and interfaces to embed the rendering engine. I am sorry, but there is too much to learn. A very large number of people know, and are comfortable using, ActiveX controls and COM data types. Frameworks have been built to make it easy to use such controls. Why would I want to learn a niche framework? That’s the main reason I have not been able to contribute to the project myself. Its a large investment.

That said, I still love the idea of having choices. The Mozilla control is a great start and if I was writing an application that only need basic features, I would seriously consider using it. I hope very much that the project keeps progressing.

More…

Nick Bradury (author of FeedDemon and TopStyle) on Mozilla control.

Joel Spolsky (of Joel On Software) on why Mozilla should be actively building an ActiveX wrapper.

MSHTML Hosting – Building UI’s

My last posts have dealt with using the WebBrowser component to display HTML pages inside your application. We have seen how it does not take much to embed the control to create your own little mini-webbrowser. In this post I want to go a little further than building a webbrowser. Lots of applications are using a web-like UI. Applications like Intuit Quicken, Microsoft Outlook and Money actually use the WebBrowser control to achieve their web-like UI’s. Microsoft Office task panes and other inductive UI’s can be developed using the WebBrowser control.

When using the WebBrowser control for building a UI there are a few issues to consider:

  1. Loading the HTML into the control. You most likely will not be using Navigate to load the HTML. In most cases you are dynamically generating the HTML from scratch or from a template loaded from a resource.
  2. Handling mouse and keyboard events. Since you are building a UI, it is very likely that the content will be interactive. There may be edit boxes, push buttons and hyperlinks in the content. You will need to handle those events and respond accordingly.

Loading HTML

The most common (but not the only) way to load HTML from a buffer to the WebBrowser is via streams. Microsoft has a nice example in the WebBrowser reference. A quick Google search will turn up a way to do it in your favorite tool or language. The key points here are:

  1. Navigate to “about:blank”
  2. Wait for the blank document to finish loading
  3. Load your HTML using the IPersistStreamInit method

Step #3 looks like this:


IHTMLDocument2* pDoc = ...;
IStream* pMyStream = ...;

IPersistStreamInit* pPersist = 0;
HRESULT hr = pDoc->QueryInterface(IID_IPersistStreamInit, (void**)&pPersist);
if (SUCCEEDED(hr) && pPersist) {
    hr = pPersist->InitNew();
    if (SUCCEEDED(hr)) {
        hr = pPersist->Load(pMyStream);
    }
    pPersist->Release();
}

Note: I would strongly recommend that any external links you have in your generated HTML (stylesheets or JavaScript) are referenced using absolute paths. Do not use relative paths. The IPersistStreamInit method does not update the control’s base URL. Navigate does update the base URL. Therefore, any relative path will have “about:blank” prepended to it and the control will not likely find your external link.

Handling Events

The ability to handle mouse and keyboard events is critical when creating an interactive UI. There are 2 basic methods to do this with WebBrowser:

  1. Use <A> tags to create hyperlinks with bogus href URL’s. Then use OnBeforeNavigate2 to intercept the bogus URL, cancel the navigation and respond to the mouse click.
  2. Hook your native code to the onclick, onkeypress, or any of the many other JavaScript events.

Method #1 is cheap and easy, but really only works with hyperlinks. In my applications, I create bogus hyperlinks that are easy to parse inside OnBeforeNavigate2 and contain breadcrumbs for me to use when responding to the click. Here is an example:

myapp://edititem/12345

I can look for a constant substring to indicate its my bogus HREF (myapp://). I can figure out the type of action (edititem). I also know which item to edit (12345). The last part could be a database ID or a pointer to an object cast to a long integer. The whole thing even looks like a real URL too.

Method #2 is much more robust, but a little more complicated to implement. There are more steps involved. You will also be working with the IHTMLDocument system and creating IDispatch wrappers. We’ll cover those topics in future posts.

MSHTML Hosting – The Basics

Hosting IE in your application is a relatively straight forward process, provided your development environment supports the use of ActiveX controls. Each language/framework has its own way of doing it: VB works directly with the WebBrowser control, MFC has its CHtmlView wrapper classes, Delphi has the TWebBrowser wrapper and C++Builder uses TCppWebBrowser. Create one of these somewhere in your application and your on your way to displaying HTML pages.

Before I go any further, I want to point you to the MSDN documentation of reusing the WebBrowser control. It will be an invaluable reference to you.

The WebBrowser control is really made up of a command interface (IWebBrowser2) and event interfaces (DWebBrowserEvents and DWebBrowserEvents2). Unless you are writing code against the raw control (please don’t), your wrapper component will expose both of these sides to you automatically. The event names may be slightly different between components. Here are the most useful methods and events:

  • Document – This property is your means to gain access to the IHTMLDocument2 MSHTML interface. More on this in later posts, I just wanted to point it out now.
  • Navigate / Navigate2 – Provides a simple way to tell the WebBrowser to display a page from a given file or URL. Remember to specify the full URL (including http://). Navigate is the simpler method. Both support functionality such as passing in flags to keep the page from displaying in IE’s cache list.
  • GoHome / GoBack / GoForward / Refresh – Allow you to mimic the IE functionality with the respective names.
  • ExecWB – Provides a way to get the WebBrowser to execute commands (listed here), such as Print, Print Preview, Save As, Copy and Find.
  • OnBeforeNavigate2 – Event that is called before the WebBrowser actually navigates to a given page. This event allows you to cancel or redirect the navigation. Many embedded browser applications use this event to implement “custom protocols” where clicking on a link will display your dialog, for example.
  • OnDocumentComplete – Event that is called when a page is fully loaded into the browser. Use this event as a trigger for hooking up other functionality that can only be done after a page is completely in the browser.
  • OnNavigateComplete2 – Event that is fired as individual pieces of the page are loaded. Many people assume this event will only be called once per page load. Not true, it is called once for each frame and then for the page. It is usually safer to use OnDocumentComplete, unless you need to be notified for each frame.

Using these methods and events, it is very easy to create a nicely featured web browser. Next time we can look at ways to make WebBrowser seem less like a web browser and more like a custom HTML display control you can use inside your application.

Working With MSHTML Hosting

On the surface it seems like a great deal. You can actually embed MSHTML, the IE HTML rendering engine, in your own application. There is a lot of cool, simple features you get out-of-the-box. As soon as you get more advanced in your features, you find things are not so simple.

First, lets clear up some terminology:

  • WebBrowser – is an ActiveX control that you can embed in your applications to create a mini webbrowser. It will display HTML pages just as well as IE itself.
  • MSHTML – is a set of COM interfaces that you can use to programmatically access the elements of an HTML page. The interfaces also allow you to take part in Dynamic HTML events as well as behind the scenes operations like editing, custom rendering and behaviors, and selection.

WebBrowser depends on MSHTML. In fact there is not much beyond navigating to an HTML page that you can do with WebBrowser alone.

Myself and my team have become quite familiar with the ins and outs of MSHTML hosting. Never have I seen a more clear case of the 80/20 rule. MSHTML will get you 80% of your features very quickly and with relative ease. That last 20% will break most of you.

By no means do I consider myself an expert on MSHTML hosting, but I have implemented some tough features. One of the hardest things about moving past the beginner level stuff is the lack of real examples. I thought I would collect some links to stuff I found useful and post some code examples as well.

More to come

Mini-Milestones Can’t Slip

I am a big proponent of the staged delivery concept of software development. McConnell’s treatment really brought it home for me when I first read Software Survival Guide. The recent Agile methods also preach the same ideas. It’s just good sense: Always have a buildable, releasable product.

The reason for this post is not to act as a cheerleader for staged delivery. It’s to vent-off some steam. At work, we have created a system of mini-milestones we are using to implement staged delivery. The other day we decide to slip a mini-milestone! Feature-creep /scope-creep caused us to miss a milestone!

I lost it. These milestones aren’t for creating releases (we are far from that). The milestones act as checkpoints during the development process. The fact that we slipped a mini-milestone tells me that our process needs to be examined. The fact that we slipped is a red flag to me. If we can’t contain scope-creep now, at this early point in the process, there is no way we can contain it later.

More on Task-based UI’s

Microsoft published the first article of a two-part series on Inductive UI (IUI) design, their buzzword for Task-based UI. This one covers a couple things:

  • How IUI can help users get frequent tasks completed faster.
  • What is a frequent task.
  • How you can implement a IUI design using a .NET library.

If you have ever seen an IUI design (think Task Panes in MS Office 2003), you will almost immediately see Web-style similarities. The article discusses this, referring to Dialog-style versus Web-style UI’s. The author does note that in many cases experienced users will prefer Dialog-style UI’s over Web-style. While I can agree with the sentiment, it usually happens in cases where the Web-style UI is designed to perform a long, drawn out wizard process. In most cases, such Task Panes, Web-style UI’s are just as unobtrusive and straight forward as the Dialog-style counterparts.

Since I am working on a Task Pane infrastructure for an application at work, I was also interested in the details of the Web-style navigation library used to build Task Panes in .NET applications. The library allows programmers to create pages, which appear to be frame-like surfaces you can drop controls onto. The library manages a stack of those pages. This is different than our approach, which uses the MSHTML web browser component to host a stack of HTML pages.

Update: Part 2 is available.

Getting Started With P2P

I finally have a reason to try to implement P2P in a development project. I have used TCP sockets on several occasions and I have played around with UDP broadcasts as well. What I am trying to do with P2P needs to be more reliable than the UDP broadcasts.

The first thing I did was Google for “P2P framework library”. Unfortunately, not many relevant hits came back. The most popular was the Java-based JXTA toolkit. I work in C++, so it does not help me. Microsoft has a Peer-to-Peer toolkit, but it requires WinXP SP1 with Advanced Networking. I need to support OS’s older than that.

However, both toolkits appear to have exactly what I would in a P2P toolkit:

  • Discovery
  • Peer Groups
  • Messaging (including multicast)

Working in C++ on Win32 (Win9x/WinNT/Win2K/WinXP) keeps me from using these toolkits.

I would really like to use an existing toolkit to abstract the P2P implementation details. It’s going to be hard enough to get my own data sharing protocol working on top of P2P.

Other toolkits I have looked at include HOWL (an implementation of Zeroconf/Rendezvous) and BEEP. HOWL seems to only enable discovery. BEEP seems to only provide the messaging. I might look into merging the two together. I am also looking into wrappers for Multicast over Winsock.

Update: I have decided to work with Multicast for now. In the end, I’ll need to create some kind of message protocol at which point I’ll look at the RFC for BEEP.

Selling The Dream

It’s bound to happen sooner or later. Every so often Scoble (Microsoft blogging wunderkind) posts something that makes me question his sensibilities. One of his latest posts stopped me dead in my tracks. The post was a reply to a Jon Udell post on UI technologies in Longhorn. I thought Udell’s post asked questions any professional should be asking. Scoble tries to bring it into focus by saying:

Ahh, but Jon, the real play here is one of programmer productivity

Programmer productivity. As if every new technology/language introduced in the last few decades failed to deliver on that same promise. As if Longhorn and .NET are the productivity “tipping point.” As if being fluent in multiple technologies/languages is a bad thing. As if being fluent in multiple technologies/languages will go away.

Robert, your overselling the reality. I will use .NET to create commercial, shrinkwrap and enterprise software someday. I am becoming fluent in .NET technologies. But you should take a break from the Kool-Aid (although I hear the grape flavor is hard to put down). I see an advantage in understanding multiple technologies/languages and being able to choose the best for a particular situation. If anything, .NET will make the technology soup developers work with worse, not better. Just like every other “productivity improvement” before it.