Finding Minimum Completion Time of Scheduled Tasks with Topological Sort

Question 1

I am assuming that the tasks do not have cyclic dependencies. As such, the tasks and their dependencies can be represented as a directed acyclic graph (abbreviated as dag from this point), where tasks are vertices, and an edge u -> v means that task u must be completed before task v can begin.

Your method is right in using topological sort to compute the order in which the tasks can be done, since the graph is a dag. I see that there are 2 main questions that you have.

Computing the tasks which can possibly be done simultaneously
Computing the minimum amount of time to complete all the tasks

Number 2 is simpler, so I shall address it first. Finding the minimum time to complete all the tasks is not difficult once you have computed the topological order. It is essentially finding the longest path in dag, you may have heard its application in the Critical path method. Essentially, the minimum amount of time needed to complete all the tasks is the time needed to complete the chain of tasks with the longest duration. This may seem counterintuitive at first glance, but the idea is that eventual completion of all tasks is dependent on how long we need to complete any chain of tasks, and hence, this is dependent on the time needed to complete the chain of tasks that take up the longest time.

Here is a pseudocode of the algorithm (I think it should be correct, but because I haven't been doing this for a bit of time, I may be wrong, so do verify yourself):

min_time_for_tasks(Graph, topological order):
    distance <- integer array whose size is number of vertices
    set all entries in visited array to negative infinity
    maxDist <- 0
    for v in topological order:
        if distance[v] == negative infinity:
            maxDist <- max(maxDist, longest_path(Graph, distance, v))
    return maxDist

// computes the longest path from vertex v
longest_path(Graph, distance, v):
    maxDist <- 0
    for all edges (v,w) outgoing from v:
        // vertex w has not been visited
        if distance[w] == negative infinity:
            longest_path(Graph, distance, w)
        // get the longest path among all vertices reachable from v
        maxDist <- max(maxDist, distance[w])
    distance[v] <- maxDist + time to complete task v
    return distance[v]

As for Number 1, computing the tasks which can be done simultaneously, this is slightly more tricky. I haven't actually done this before, but I do think that it can be done using a topological sort algorithm called Kahn's algorithm (it's the first algorithm for Topological Sort on wikipedia, link is here). I don't know about you, but normally I perform topological sort using a depth first search and a stack, and then popping vertices from the stack to obtain the order. Essentially, Kahn's algorithm is a topological sort algorithm, and with some slight modifications, it should be able to solve our first problem. I haven't done this before, so please verify this again. Pseudocode with explanation in the comments:

// This is a modified kahn's algorithm.
// Again, the graph is assumed to be a dag, and we first compute
// the number of incoming edges to every vertex, since Kahn's
// algorithm is dependent on this information.
// This function will return an array of sets of tasks which
// can be done at the same time.
// So all tasks in the set in returnArray[0] can be done at the same
// time, all tasks in the set in returnArray[1] can be done at the
// same time, etc. Explanation will be available after the
// pseudocode
compute_simultaneous_tasks(Graph):
    into <- integer array, all entries initialized to 0
    // this function is defined below
    compute_incoming_edges(Graph, into)
    returnArray <- empty array
    VSet <- Set of all vertices in the graph
    while VSet is not empty:
        // retrieve all vertices with no incoming edges
        // this means their prerequisite tasks have been completed,
        // and we can execute that particular task
        headVertices <- all vertices `v` in VSet such that into[v] is 0

        // remove all the vertices in headVertices from VSet
        VSet <- VSet.remove(headVertices)

        // we are going to remove the vertices in headVertices,
        // so the edges outgoing from them will be removed as well.
        for each vertex v in headVertices:
            for each vertex w outgoing from v:
                into[w] <- into[w] - 1

        // all the vertices in headVertices can be started at the
        // same time, since their prerequisite tasks have been
        // completed
        returnArray.append(headVertices)

    return returnArray

// computes the number of edges leading into each vertex
// so into[x] gives the number of edges leading into vertex x
// from some other vertex
compute_incoming_edges(Graph, into):
    for each vertex v in Graph:
        for each vertex w outgoing from v:
            into[w] <- into[w] + 1

So the compute_simultaneous_tasks function will give return an array of sets. Each set contains vertices/tasks which can be done at the same time. To get the main idea of why this is the case, inside the main loop of compute_simultaneous_tasks, we extract all vertices which have no incoming edges. This is equivalent to extracting all tasks with no prerequisites. Hence, we can see that it is safe to execute this set of tasks simultaneously, since they have no prereqs, and are certainly not dependent on each other! Then, we remove their outgoing edges, to simulate their completion. Then we move on to the next iteration of the loop, etc.

So we see that at each iteration of the while loop, we are doing what we just described, and hence it is safe to execute all tasks in returnArray[0] at the same time, all tasks at returnArray[1] at the same time but only after all tasks in returnArray[0] are done, all tasks in returnArray[2] after those in returnArray[1] are done, and so on.

So in this modification of Kahn's algorithm, instead of removing one vertex with 0 incoming edges at one step, we remove all vertices with 0 incoming edges at one step, in order to obtain all the tasks which can be done simulateneously in a "wave", with returnArray[0] being wave 0, returnArray[1] being wave 1, etc. Take note that we only remove outgoing edges from all vertices in a wave after we have extracted all the vertices with 0 incoming edges.

While I am not able to give a formal proof, I do believe that if one was to complete all the a wave simultaneously, one will be able to complete all the tasks in the minimum time we computed above, due to how dependencies are taken into account of.

Hope that helps, and Happy New Year =)

Question 2

Topological sort is the right approach. It helps in identifying the task dependencies and finding the sum of durations of all the tasks that are dependent on each other. You have to repeat the process for each unvisited task until you find the sum of each set of dependent tasks. The max of all the sums is the minimum time needed to finish all tasks simultaneously.