Question

I'm trying to implement a multi threaded, recursive file search logic in Visual C++. The logic is as follows: Threads 1,2 will start at a directory location and match the files present in the directory with the search criteria. If they find a child directory, they will add it to a work Queue. Once a thread finishes with the files in a directory, it grabs another directory path from the work queue. The work queue is a STL Stack class guarded with CriticalSections for push(),pop(),top() calls.

If the stack is empty at any point, the threads will wait for a minute amount of time before retrying. Also when all the threads are in waiting state, the search is marked as complete.

This logic works without any problems but I feel that I'm not gaining the full potential of using threads because there isn't drastic performance gain compared to using single thread. I feel the work Stack is the bottle neck but can't figure out how to do away with the locking part. I tried another variation where each thread will be having its own Stack and will add a work item to the global Stack only when the local stack size crosses a fixed number of work items. If the local Stack is empty, threads will try fetching from global queue. I didn't find noticeable difference even with this variation. Does any one have any suggestions for improving the synchronization logic.

Regards,

Was it helpful?

Solution

I really doubt that your work stack is the bottleneck. The disk only has one head, and can only read one stream of data at a time. As long as your threads are processing the data as fast as the disk can supply it, there's not much else you can do that's going to have any significant effect on overall speed.

For other types of tasks your queue might become a significant bottleneck, but for this task, I doubt it. Keep in mind the time scales of the operations here. A simple operation that happens inside of a CPU takes considerably less than a nanosecond. A read from main memory takes on the order of tens of nanoseconds. Something like a thread switch or synchronization takes on the order of a couple hundred nanoseconds or so. A single head movement on the disk drive takes on the order of a millisecond or so (1,000,000 nanoseconds).

OTHER TIPS

In addition to @Jerry's answer, your bottleneck is the disk system. If you have a RAID array you might see some moderate improvement from using 2 or 3 threads.

If you have to search multiple drives (note: physical drives, not volumes on a single physical drive) you can use extra threads for each of them.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top