File Copy Tool w/ Producer/Consumer Model

https://stackoverflow.com/questions/13062275

14-07-2021
|

Question

so I was looking over my next school assignment, and I'm baffled. I figured I would come to the experts for some direction. My knowledge on synchronization is severely lacking, and I didn't do so hot on the "mcopyfile" assignment it refers to. Terrible would probably be a good word for it. If I could get some direction on how to accomplish this problem, it would be much appreciated. Not looking for someone to do my assignment, just need someone to point me in the right direction. baby steps.

Based on the multi-threaded file copy tool (mcopyfile) you have created in Lab 2, now please use a worker pool (Producer-Consumer model) implementation that uses a fixed number of threads to handle the load (regardless how many files in the directory to copy). Your program should create 1 file copy producer thread and multiple file copy consumer threads (this number is taken from the command-line argument). The file copy producer thread will generate a list of (source and destination) file descriptors in a buffer structure with bounded size. Each time when the producer accesses the buffer it will write one (source, destination) file entry (per visit). And all file copy consumer threads will read from this buffer, execute the actual file copy task, and remove the corresponding file entry (each consumer will consume one entry each time). Both producer and consumer threads will write a message to standard output giving the file name and the completion status (e.g., for producer: “Completing putting file1 in the buffer”, for consumer: “Completing copying file1 to …”).

Solution

Assuming, you know how to spawn threads, let me break down the problem for you. There are following components:

Producer. It generates Tasks for the Consumers based on the source directory input parameter.
Task. A task is information for Consumer to execute its copy task. Namely a tuple of source file descriptor and destination file descriptor.
Queue. It is the central piece of communication between Producer and Consumer. Producers writes Tasks to Queue and Consumer consumes it.
Consumer. You have a pool of actual workers that take Task as input and executes copy operation.

Now as per the question, spawn a thread for producer and n threads for consumers. And this is what the threads do:

Producer thread
1. For list of files in the source directory:
  1. Task <- (Source file path, destination file path)
  2. Acquire lock on Queue
  3. Write Task to queue
  4. Release lock on Queue
  5. Acquire lock on stdout
  6. Write to stdout
  7. Release lock on stdout
Consumer thread
1. While True:
  1. If size of queue == 0:
    1. Sleep for some time
  2. Else:
    1. Acquire lock on Queue
    2. Dequeue a Task
    3. Release lock on Queue
    4. Execute copy operation
    5. Acquire lock on stdout
    6. Write to stdout
    7. Release lock on stdout

I hope this helps.

OTHER TIPS

Your assignment looks pretty straightforward to me once you know what API/library you'll use for the threading functionality.

First, you'll parse the command-line argument and create the specified number of threads, then from the main thread obtain the list of files in the folder and start putting them in an array (like a std::vector) that is shared among the threads and is synchronized with a mutex (or a critical section on Windows). Whenever one of the consumer threads acquires the mutex, it makes a copy of a file entry in the array, removes that entry from the array, releases the mutex so that another thread can start doing the same, and starts copying the file represented by the entry it removed from the array.

I would give you some code snippets, but you didn't say what API/library you're using for the threading functionality.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow