Im working on an application that uses a CHtmlView. New requirements mean I would like to be able to get the HTML source from the class to parse for a specific tag (or if possible just get the information in the tag). This would be fine if we were using a newer system and I could use CHtmlView::GetSource but it doesn't exist.

I've had a pretty extensive search online but am pretty new to most of Windows programming and haven't been able to achieve anything useful yet.

So if anyone has an example of how to extract the HTML from a CHtmlView without using GetSource I would appreciate seeing it. I've tried

    BSTR bstr;
    _bstr_t * bstrContainer;
HRESULT hr;
IHTMLDocument2 * pDoc;
IDispatch * pDocDisp = NULL;
pDocDisp = this->GetHtmlDocument();
if (pDocDisp != NULL) {
    hr = pDocDisp->QueryInterface (IID_IHTMLDocument2, (void**)&pDoc);
    if (SUCCEEDED(hr)) {
        if (pDoc->toString(&bstr) != S_OK) {
                         //error...
        } else {
            bstrContainer = new _bstr_t(bstr);
            size = (bstrContainer->length()+1)*2;
            realString = new char[size];
            strncpy(realString, (char*)(*bstrContainer), size);
        }
    } else {
        //error
    }
    pDocDisp->Release();
}

but it mostly just gives me "[object]" in realString. Like I said, new to Windows.

Any help appreciated.

有帮助吗?

解决方案

Add this helper function into your CHtmlView-derived class to retrieve the html source. Remember to check the returned boolean state from this function as com-interface can be quite unreliable when system resources are low.

 /* ============================================== */
BOOL CTest1View::GetHtmlText(CString &strHtmlText) 
{
    BOOL bState = FALSE;
    // get IDispatch interface of the active document object
    IDispatch *pDisp = this->GetHtmlDocument();
    if (pDisp != NULL) 
    {   // get the IHTMLDocument3 interface
        IHTMLDocument3 *pDoc = NULL;
        HRESULT hr = pDisp->QueryInterface(IID_IHTMLDocument3, (void**) &pDoc);
        if (SUCCEEDED(hr))
        {   // get root element
            IHTMLElement *pRootElement = NULL;
            hr = pDoc->get_documentElement(&pRootElement);
            if (SUCCEEDED(hr))
            {   // get html text into bstr
                BSTR bstrHtmlText;
                hr = pRootElement->get_outerHTML(&bstrHtmlText);
                if (SUCCEEDED(hr))
                {   // convert bstr to CString
                    strHtmlText = bstrHtmlText;
                    bState = TRUE;
                    SysFreeString(bstrHtmlText);
                }
                pRootElement->Release();
            }
            pDoc->Release();
        }
        pDisp->Release();
    }
    return bState;
}
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top