Question

Im working on an application that uses a CHtmlView. New requirements mean I would like to be able to get the HTML source from the class to parse for a specific tag (or if possible just get the information in the tag). This would be fine if we were using a newer system and I could use CHtmlView::GetSource but it doesn't exist.

I've had a pretty extensive search online but am pretty new to most of Windows programming and haven't been able to achieve anything useful yet.

So if anyone has an example of how to extract the HTML from a CHtmlView without using GetSource I would appreciate seeing it. I've tried

    BSTR bstr;
    _bstr_t * bstrContainer;
HRESULT hr;
IHTMLDocument2 * pDoc;
IDispatch * pDocDisp = NULL;
pDocDisp = this->GetHtmlDocument();
if (pDocDisp != NULL) {
    hr = pDocDisp->QueryInterface (IID_IHTMLDocument2, (void**)&pDoc);
    if (SUCCEEDED(hr)) {
        if (pDoc->toString(&bstr) != S_OK) {
                         //error...
        } else {
            bstrContainer = new _bstr_t(bstr);
            size = (bstrContainer->length()+1)*2;
            realString = new char[size];
            strncpy(realString, (char*)(*bstrContainer), size);
        }
    } else {
        //error
    }
    pDocDisp->Release();
}

but it mostly just gives me "[object]" in realString. Like I said, new to Windows.

Any help appreciated.

Was it helpful?

Solution

Add this helper function into your CHtmlView-derived class to retrieve the html source. Remember to check the returned boolean state from this function as com-interface can be quite unreliable when system resources are low.

 /* ============================================== */
BOOL CTest1View::GetHtmlText(CString &strHtmlText) 
{
    BOOL bState = FALSE;
    // get IDispatch interface of the active document object
    IDispatch *pDisp = this->GetHtmlDocument();
    if (pDisp != NULL) 
    {   // get the IHTMLDocument3 interface
        IHTMLDocument3 *pDoc = NULL;
        HRESULT hr = pDisp->QueryInterface(IID_IHTMLDocument3, (void**) &pDoc);
        if (SUCCEEDED(hr))
        {   // get root element
            IHTMLElement *pRootElement = NULL;
            hr = pDoc->get_documentElement(&pRootElement);
            if (SUCCEEDED(hr))
            {   // get html text into bstr
                BSTR bstrHtmlText;
                hr = pRootElement->get_outerHTML(&bstrHtmlText);
                if (SUCCEEDED(hr))
                {   // convert bstr to CString
                    strHtmlText = bstrHtmlText;
                    bState = TRUE;
                    SysFreeString(bstrHtmlText);
                }
                pRootElement->Release();
            }
            pDoc->Release();
        }
        pDisp->Release();
    }
    return bState;
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top