Question

I have an application (written in C for Linux) which has to listen for a list of 32 TCP sockets on specific ports. These sockets can be 'masked' by the user so that events will no longer be reported for them until they are unmasked.

I am using the non blocking variety of poll (timeout of 0) as my application is not threaded (for backward compatibility reasons unfortunately). My question regards the performance implications of having file descriptors in a poll array which have no events to listen for (events = 0) versus lengthening and shrinking the array dynamically which will presumably be time consuming.

Is poll smart enough to 'skip over' descriptors which do not have any events enabled or will I see a large performance hit from their existence?

Either way, given the sockets will be polled frequently (probably at 100Hz) and changed relatively infrequently (once every second) which of these two methods is likely to see the best performance? Will the overhead of polling descriptors with no events overwhelm the overhead of having to remake the array every time I make a change?

Was it helpful?

Solution

Looking at the kernel code, I'd say the overhead of having no-event descriptors is small. I do not think you'll find any significant run-time difference between having an array with just one descriptor, and having an array with 32 descriptors with only one having a nonzero event mask.

Having a much larger number of descriptors does mean that the kernel may need to allocate extra pages internally to hold the structures. So, if you had lots-and-lost of descriptors, then pruning the no-event descriptors from the array would make sense.

Manipulating the array (of struct fds) is definitely not time consuming. I wrote some test functions, which use two temporary arrays -- one of struct fds to be filled with descriptors with nonzero event masks, fed to poll(), and another containing pointers so that revents can be updated easily to the original array --, and microbenchmarked them. On x86-64 (AMD Athlon II X4 640), the functions took about 5 clock cycles per descriptor, on arrays of up to a thousand descriptors. Definitely neglible overhead, comparing how much CPU time even basic networking functions consume!

I fear you are suffering from premature optimization. The overall structure of your application or library will have much greater effect on the efficiency and speed of your implementation. So, instead of worrying about this, spend your time making sure your design is robust and sensible.

I've found that trying to make sure that data keeps flowing at all times (not blocking on a specific receive, reading incoming TCP packets as soon as possible, trying to do preparatory work before the corresponding data arrives, so that the data can be immediately pounced upon, et cetera), yields pretty good overall structures.

If you have time, you can always finesse the details after profiling the working application; thus concentrating your efforts to the true bottlenecks.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top