Question

Do we have any benchmarks for a range of descriptors from 1 to 50 or so? Most benchmarks I see are for large number of descriptors 100s..1000s...

I am currently using poll with 16 descriptors and thinking of using epoll if that will improve speed of app.

Please advise in 3 scenarios with 16 socket descriptors in the set for poll/epoll:
1. most of the sockets are active...>both should be same performance?
2. half active half idle....what is better here?
3. mostly idle...> clearly epoll is better ?
Was it helpful?

Solution

I would very much suspect that switching from poll() to epoll() will not make any difference in the performance of your application. The main advantage of epoll() crops up when you have many file descriptors (hundreds or thousands) where a standard poll() requires a little more work to be done on every call, whereas epoll() does the setup in advance - as long as you don't change the set of file descriptors you're watching, each call is very slightly quicker. But generally this difference is only noticeable for many, many file descriptors.

Bear in mind that if the set of file descriptors you're watching changes very frequently, epoll()'s main advantage is lost because you still need to do the work of passing new file descriptors into the kernel. So, if you're handling lots of short-lived connections then it's even less compelling to switch to it.

Another difference is that epoll() can be edge-triggered, where the call only returns when new activity occurs on a descriptor, or level-triggered, where the call returns while the descriptor is read/write-ready. The standard poll() call is always level-triggered. For most people, however, level-triggered is what they want - edge-triggered interfaces are occasionally useful, but in most cases they lead to race conditions where data arrives on a socket after reading but before entering the epoll() call. My advice is stay well away from edge-triggered code unless you really, really know what you're doing.

The price you pay for epoll() is the lack of portability - both poll() and select() are standard POSIX interfaces, so your code will be much more portable by using them. The epoll() call, on the other hand, is only available on Linux. Some other Unix variants also have their own equivalent mechanisms, such as kqueue on FreeBSD, but you have to write different code for each platform in that case.

My advice is until you reach a point where you're using many file descriptors, don't even worry about epoll() - seriously, there are almost certainly many other places in your code to make far bigger performance improvements and it's entirely possible that epoll() may not be faster for your use-case anyway.

If you do reach a stage where you're handling many connections and the rest of your code is already pretty optimal then you should first consider something like libev which is a cross-platform interface which uses the best performance calls on each particular platform. It performs very well and it's probably rather less hassle overall than directly using epoll() even if you only want to support Linux.

I haven't referred to the three scenarios you mention so far because I don't believe any of them will perform any differently for a low number of file descriptors such as 16. For a large number of file descriptors, epoll() should outperform poll() particularly where there are mostly idle file descriptors. If all file descriptors are always active, both calls require iterating through every connection to handle it. However, as the proportion of idle connections increases, epoll() gives better performance as it only returns the active connections - with poll() you still have to iterate through everything and most of them will be skipped, but epoll() returns you only the ones you need to handle (up to a maximum limit you can specify).

To spell that out explicitly (and this is only relevant for large numbers of connections, as I mentioned above):

  1. Most of the sockets are active: Both calls broadly comparable, perhaps epoll() still slightly ahead.
  2. Half active half idle: Would expect epoll() to be somewhat better here.
  3. Mostly idle: Would expect epoll() to definitely be better here.

EDIT:

You might like to see this graph which is from the libevent author and shows the relative overhead of handling an event as the number of file descriptors changes. Note how all the lines are converging around the origin, demonstrating that all the mechanisms achieve comparable performance for a small number of descriptors.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top