C++ Concurrent GET requests

https://stackoverflow.com/questions/809289

03-07-2019
|

Question

I am writing a C++ application and would like to request several data files through a HTTP GET request simultaneously, where should I look to get started (needs to be cross-platform).

Run Application
Create a list of URLs { "http://host/file1.txt", "http://host/file2.txt", "http://host/file3.txt"}
Request all the URLs simultaneously and load the contents to variables (don't want disk writes). Each file has about 10kB of data.

What libraries would you recommend I use? libcurl? curlpp? boost asio? would I need to roll-my-own multi threading to request all the files simultaneously? is there a easier way?

Edit: I will need to request about 1000 GET requests simultaneously. Most likely I will do this in batches (100 at a time, and creating more connections as made one are completed).

Solution

I would recommend libcurl. I'm not super-familiar with it, but it does have a multi-interface for performing multiple simultaneous HTTP operations.

Depending on what solution you go with, it's possible to do asynchronous I/O without using multithreading. The key is to use the select(2) system call. select() takes a set of file descriptors and tells you if any of them have data available. If they do, you can then proceed to use read(2) or recv(2) on them without worrying about blocking.

OTHER TIPS

Web browsers often maintain a pool of worker threads to do downloads, and assign downloads to them as they become free. IIRC the HTTP RFC has something to say about how many simultaneous connections you should make to the same server at the same time: too many is rude.

If several of the requests are to the same server, and it supports keep-alive (which almost everyone does), then that may be better behaviour than spamming it with multiple simultaneous requests. The general idea is that you use one TCP/IP connection for multiple requests in series, thus saving the handshaking overhead. The practical result, in my experience of implementing Java HTTPConnection classes, is that you introduce a subtle bug to do with not always clearing the state correctly when you re-use the connection for a new request, and spend considerable time staring at logging/sniffer data ;-)

libcurl certainly supports keepalive (enabled by default, I think).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow