How to extract text from a specific area of a webpage (in Arabic not English) given the url using C/C++?

For example: given the url of this wikipedia article I want to extract the body of the article (highlighted in the image below) and throw away the other parts of the webpage like the heading, the menus on the right and on the left, etc. I only need the body to be parsed into a string.

example image

有帮助吗?

解决方案

To get only the article text from a Wikipedia page, add ?action=render to your url.

Then use e.g. curl to fetch it. Search the web for curl/c++ tutorials if you don't know how. You are looking for something like this (just to give you an idea):

#include <stdio.h>
#include <curl/curl.h>

int main(void) {

    CURL* curl;
    CURLcode result;

    curl = curl_easy_init();
    curl_easy_setopt(curl, CURLOPT_URL, "https://ar.wikipedia.org/wiki/%D8%B3%D9%8A_%D8%A5%D9%86_%D8%A5%D9%86_%D8%A7%D9%84%D8%B9%D8%B1%D8%A8%D9%8A%D8%A9?action=render");

    result = curl_easy_perform(curl);

    curl_easy_cleanup(curl);

    return 0;
}
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top