In this post I wanted to cover some miscellaneous things you may want to do with your embedded WebBrowser. On its own, the IWebBrowser2 interface does not support doing much more than we already covered in previous posts. However, if you start
using the MSHTML DOM interfaces, much more functionality is available. Here is a list of simple things you can implement without too much difficulty:
- Retrieving HTML from the WebBrowser.
- Retrieving the HTML of the current selection.
- Finding text in the HTML and selecting it.
- Creating an image of the current HTML.
Retrieving HTML from the WebBrowser
There are times when you might want to get the currently loaded HTML from the control. You may want to save it to a file or parse it for information. For this functionality, you have to use the IPersistXxx interfaces. These are the same we used to load HTML into the WebBrowser from memory. The same works in reverse:
IHTMLDocument2* pDoc = ...;
IStream* pMyStream = ...;
IPersistStreamInit* pPersist = 0;
HRESULT hr = pDoc->QueryInterface(IID_IPersistStreamInit, (void**)&pPersist);
if (SUCCEEDED(hr) && pPersist) {
hr = pPersist->Save(pMyStream, true);
pPersist->Release();
}
Retrieving the HTML of the current selection
If you want to limit the HTML to just what a user has selected, instead of the entire document, we can use the IHTMLXxx COM interfaces. The first thing you need to do is get access to the IHTMLDocument interface for the current document. IWebBrowser2 gives you access using it’s Document property. The Document property returns an IDispatch interface, so we need to QueryInterface the IDispatch interface for an IHTMLDocument interface, like so (raw C++):
IDispatch* pDocDisp = 0;
HRESULT hr = pWebBrowser->get_Document(&pDocDisp);
IHTMLDocument2* pDoc = 0;
hr = pDocDisp->QueryInterface(IID_IHTMLDocument2, (void**)&pDoc);
if (SUCCEEDED(hr)) {
//...
pDoc->Release();
}
pDocDisp->Release();
The IHTMLXxx interfaces follow the W3C DOM specification used for JavaScript very closely. If your familiar with those objects, the IHTMLXxx interface will be easy to grasp. In fact, if you know how to do something using JavaScript, you can duplicate it your compiled code using the IHTMLXxx interfaces.
That said, you can get the current selection as a IHTMLTxtRange from the document element. Once you have a text range, you can retrieve the plain text or HTML text as shown below:
IHTMLDocument2* pDoc = ...;
IHTMLSelectionObject* pSelection = 0;
HRESULT hr = pDoc->get_selection(&pSelection);
if (SUCCEEDED(hr)) {
IDispatch* pDispRange = 0;
hr = pSelection->createRange(&pDispRange);
if (SUCCEEDED(hr)) {
IHTMLTxtRange* pTextRange = 0;
hr = pDispRange->QueryInterface(IID_IHTMLTxtRange, (void**)&pTextRange);
if (SUCCEEDED(hr)) {
CComBSTR sText;
pTextRange->get_text(&sText);
// or
pTextRange->get_htmlText(&sText);
//...
pTextRange->Release();
}
pDispRange->Release();
}
pSelection->Release();
}
pDoc->Release();
Finding text in the HTML and selecting it
The Google toolbar in IE does this to make it easy to spot keywords found in the page. We are using body and text range objects. This time we are making a IHTMLTxtRange object, not getting the current selection. IHTMLTxtRange has find and select methods that make this task easy. Be sure to check out the parameters for IHTMLTxtRange::findText as they can be used to modify how the text is searched:
IHTMLDocument2* pDoc = ...;
IHTMLElement* pBodyElem = 0;
HRESULT hr = pDoc->get_body(&pBodyElem);
if (SUCCEEDED(hr)) {
IHTMLBodyElement* pBody = 0;
hr = pBodyElem->QueryInterface(IID_IHTMLBodyElement, (void**)&pBody);
if (SUCCEEDED(hr)) {
IHTMLTxtRange* pTextRange = 0;
hr = pBody->createTextRange(&pTextRange);
if (SUCCEEDED(hr)) {
CComBSTR sText = "findme";
VARIANT_BOOL bSuccess;
hr = pTextRange->findText(sText, 0, 0, &bSuccess);
if (SUCCEEDED(hr) && bSuccess == VARIANT_TRUE)
pTextRange->select();
pTextRange->Release();
}
pBody->Release();
}
pBodyElem->Release();
}
pDoc->Release();
Creating an image of the current HTML
Turning the contents of the WebBrowser into an image is not as straight forward as you may expect. Looking at the IHTMLXxx interfaces does turn up an IHTMLElementRenderer interface. IHTMLElementRenderer contains:
IHTMLElementRender::DrawToDC(HDC hDC);
You can try to use this method, but I have found that it is not very reliable and reacts inconsistently depending on the type of HDC you give it. A more reliable method uses an older OLE method. IViewObject supports the ability to render to an HDC. The IWebBrowser2::Document property can be QueryInterfaced for IViewObject. Two things to note while using this method, (1) you will probably want to turn off the scrollbars and 3D border since they will show up in the image and (2) you will want to resize the WebBrowser to the size of the contained HTML if you want to capture the entire content in the image. You may want to only make these changes temporarily and change them back after the image is captured:
IHTMLDocument2* pDoc = ...;
IHTMLElement* pBodyElem = 0;
HRESULT hr = pDoc->get_body(&pBodyElem);
if (SUCCEEDED(hr)) {
IHTMLBodyElement* pBody = 0;
hr = pBodyElem->QueryInterface(IID_IHTMLBodyElement, (void**)&pBody);
if (SUCCEEDED(hr)) {
// hide 3D border
IHTMLStyle* pStyle;
hr = pBodyElem->get_style(&pStyle);
if (SUCCEEDED(hr)) {
pStyle->put_borderStyle(CComBSTR("none"));
pStyle->Release();
}
// hide scrollbars
pBodyElement->put_scroll(CComBSTR("no"));
// resize the browser component to the size of the HTML content
IHTMLElement2* pBodyElement2;
hr = Body->QueryInterface(IID_IHTMLElement2, (void**)&BodyElement2)
if (SUCCEEDED(hr)) {
long iScrollWidth = 0;
pBodyElement2->get_scrollWidth(&iScrollWidth);
long iScrollHeight = 0;
pBodyElement2->get_scrollHeight(&iScrollHeight);
// these lines depend on your WebBrowser wrapper
pWebBrowser->SetWidth(iScrollWidth);
pWebBrowser->SetHeight(iScrollHeight);
pBodyElement2->Release();
IViewObject* pViewObject;
pDoc->QueryInterface(IID_IViewObject, (void**)&pViewObject);
if (pViewObject) {
/* however you want to make your image HDC.
You can size it using iScrollHeight & iScrollWidth */
HDC hImageDC = ... // could be bitmap or enhanced metafile
HDC hScreenDC = ::GetDC(0);
RECT rcSource = {0, 0, iScrollWidth, iScrollHeight};
hr = pViewObject->Draw(DVASPECT_CONTENT, 1, NULL, NULL,
hScreenDC, hImageDC, rcSource,
NULL, NULL, 0);
::ReleaseDC(0, hScreenDC);
pViewObject->Release();
}
}
pBody->Release();
}
pBodyElem->Release();
}
pDoc->Release();
As you can see, there is a lot of things you can do using the MSHTML object model. Some of it can be tricky. Other things just aren’t supported as well as they should be for an application developer. I guess you could say that application developers have their own list of issues for IE.