Question

I'm writing an application that scrapes some web pages using QWebPage. I'm having some trouble when the response is a Http redirect (e.g 302, 303, etc). The QWebPage simply does not follow the redirect.

To work around this issue I've connected to the page's network manager's finished signal to capture the status of the response and load any redirect, however, when I call the load method for the second time on the QWebPage, it justs sets the url to blank and doesn't issue any request whatsoever.

Here are some relevant bits of code:

connect(page->networkAccessManager(), SIGNAL(finished(QNetworkReply*)), SLOT(gotReply(QNetworkReply*)));
connect(page, SIGNAL(loadFinished(bool)), SLOT(doneLoading(bool)));
page->mainFrame()->load(url);

My slot:

void Snapshot::gotReply(QNetworkReply *reply)
{
    if(reply->header(QNetworkRequest::ContentTypeHeader).toString().contains(QString("text/html")))
    {
        qDebug() << "Got reply " + reply->url().toString() + " - " + reply->attribute(QNetworkRequest::HttpStatusCodeAttribute).toString() + " - " + reply->header(QNetworkRequest::ContentTypeHeader).toString();
    }

    if(!statusCode && reply->header(QNetworkRequest::ContentTypeHeader).toString().contains(QString("text/html"))) {
        statusCode = reply->attribute(QNetworkRequest::HttpStatusCodeAttribute).toInt();
        redirectUrl = QUrl(reply->header(QNetworkRequest::LocationHeader).toUrl());
    }
}

void Snapshot::doneLoading(bool)
{
    // A reasonable waiting time for any script to execute
    timer->start(3000);
}

void Snapshot::doneWaiting()
{
    if( statusCode != 0 &&
        statusCode != 301 &&
        statusCode != 302 &&
        statusCode != 303
       ) {
        qDebug() << page->mainFrame()->url().toString();
        qDebug() << page->mainFrame()->toHtml();

        QImage image(page->viewportSize(), QImage::Format_ARGB32);
        QPainter painter(&image);

        page->mainFrame()->render(&painter);

        painter.end();

        image.save(*outputFilename);

        delete outputFilename;
        QApplication::quit();
    }
    else if(statusCode != 0) {
        statusCode = 0;
        qDebug() << "Redirecting to: " + redirectUrl.toString();
        if(page->mainFrame()->url().toString().isEmpty()) {
            qDebug() << "about:blank";
            page->mainFrame()->load(this->redirectUrl); // No network activity after this
            qDebug() << "Loading";
        }
    }

    // This should ensure that the program never hangs
    if(statusCode == 0) {
        if(tries > 5) {
            qDebug() << "Giving up.";
            QApplication::quit();
        }
        tries++;
    }
}
Was it helpful?

Solution

The problem was that the page I was testing was redirecting to https and had a self-signed certificate.

The solution was to make the QNetworkReply ignore the ssl errors:

void Snapshot::sslErrors(QNetworkReply *reply, const QList<QSslError> &errors)
{
    reply->ignoreSslErrors();
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top