Question

i'm trying to get web-page data in string that than i could parse it. I didn't found any methods in qwebview, qurl and another. Could you help me? Linux, C++, Qt.

EDIT:

Thanks for help. Code is working, but some pages after downloading have broken charset. I tried something like this to repair it:

QNetworkRequest *request = new QNetworkRequest(QUrl("http://ru.wiktionary.org/wiki/bovo"));

request->setRawHeader( "User-Agent", "Mozilla/5.0 (X11; U; Linux i686 (x86_64); "
                       "en-US; rv:1.9.0.1) Gecko/2008070206 Firefox/3.0.1" );
request->setRawHeader( "Accept-Charset", "win1251,utf-8;q=0.7,*;q=0.7" );
request->setRawHeader( "charset", "utf-8" );
request->setRawHeader( "Connection", "keep-alive" );

manager->get(*request);

Any results =(.

Was it helpful?

Solution

Have you looked at QNetworkAccessManager? Here's a rough and ready sample illustrating usage:

class MyClass : public QObject
{
Q_OBJECT

public:
    MyClass();
    void fetch(); 

public slots:
    void replyFinished(QNetworkReply*);

private:
    QNetworkAccessManager* m_manager;
};


MyClass::MyClass()
{
    m_manager = new QNetworkAccessManager(this);

    connect(m_manager, SIGNAL(finished(QNetworkReply*)),
         this, SLOT(replyFinished(QNetworkReply*)));

}

void MyClass::fetch()
{
    m_manager->get(QNetworkRequest(QUrl("http://stackoverflow.com")));
}

void MyClass::replyFinished(QNetworkReply* pReply)
{

    QByteArray data=pReply->readAll();
    QString str(data);

    //process str any way you like!

}

In your in your handler for the finished signal you will be passed a QNetworkReply object, which you can read the response from as it inherits from QIODevice. A simple way to do this is just call readAll to get a QByteArray. You can construct a QString from that QByteArray and do whatever you want to do with it.

OTHER TIPS

Paul Dixon's answer is probably the best approach but Jesse's answer does touch something worth mentioning.

cURL -- or more precisely libcURL is a wonderfully powerful library. No need for executing shell scripts and parsing output, libCURL is available C,C++ and more languages than you can shake an URL at. It might be useful if you are doing some weird operation (like http POST over ssl?) that qt doesnt support.

Have you looked into lynx, curl, or wget? In the past I have needed to grab and parse info from a website, sans db access, and if you are trying to get dynamically formatted data, I believe this would be the quickest way. I'm not a C guy, but I assume there is a way to run shell scripts and grab the data, or at least get the script running and grab the output from a file after writing to it. Worst case scenario, you could run a cron and check for a "finished" line at the end of the written file with C, but I doubt that will be necessary. I suppose it depends on what you're needing it for, but if you just want the output html of a page, something as east as a wget piped to awk or grep can work wonders.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top