Functional implementation of Tarjan's Strongly Connected Components algorithm

Question 1

The following functional Scala code generates a map that assigns a representative to each node of a graph. Each representative identifies one strongly connected component. The code is based on Tarjan's algorithm for strongly connected components.

In order to understand the algorithm it might suffice to understand the fold and the contract of the dfs function.

def scc[T](graph:Map[T,Set[T]]): Map[T,T] = {
  //`dfs` finds all strongly connected components below `node`
  //`path` holds the the depth for all nodes above the current one
  //'sccs' holds the representatives found so far; the accumulator
  def dfs(node: T, path: Map[T,Int], sccs: Map[T,T]): Map[T,T] = {
    //returns the earliest encountered node of both arguments
    //for the case both aren't on the path, `old` is returned
    def shallowerNode(old: T,candidate: T): T = 
      (path.get(old),path.get(candidate)) match {
        case (_,None) => old
        case (None,_) => candidate
        case (Some(dOld),Some(dCand)) =>  if(dCand < dOld) candidate else old
      }

    //handle the child nodes
    val children: Set[T] = graph(node)
    //the initially known shallowest back-link is `node` itself
    val (newState,shallowestBackNode) = children.foldLeft((sccs,node)){
      case ((foldedSCCs,shallowest),child) =>
        if(path.contains(child))
          (foldedSCCs, shallowerNode(shallowest,child))
        else {
          val sccWithChildData = dfs(child,path + (node -> path.size),foldedSCCs)
          val shallowestForChild = sccWithChildData(child)
          (sccWithChildData, shallowerNode(shallowest, shallowestForChild))
        }
    }

    newState + (node -> shallowestBackNode)
  }

  //run the above function, so every node gets visited
  graph.keys.foldLeft(Map[T,T]()){ case (sccs,nextNode) =>
    if(sccs.contains(nextNode))
      sccs
    else
      dfs(nextNode,Map(),sccs)
  }
}

I've tested the code only on the example graph found on the Wikipedia page.

Difference to imperative version

In contrast to the original implementation, my version avoids explicitly unwinding the stack and simply uses a proper (non tail-) recursive function. The stack is represented by a persistent map called path instead. In my first version I used a List as stack; but this was less efficient since it had to be searched for containing elements.

Efficiency

The code is rather efficient. For each edge, you have to update and/or access the immutable map path, which costs O(log|N|), for a total of O(|E| log|N|). This is in contrast to O(|E|) achieved by the imperative version.

Linear Time implementation

The paper in Chris Okasaki's answer gives a linear time solution in Haskell for finding strongly connected components. Their implementation is based on Kosaraju's Algorithm for finding SCCs, which basically requires two depth-first traversals. The paper's main contribution appears to be a lazy, linear time DFS implementation in Haskell.

What they require to achieve a linear time solution is having a set with O(1) singleton add and membership test. This is basically the same problem that makes the solution given in this answer have a higher complexity than the imperative solution. They solve it with state-threads in Haskell, which can also be done in Scala (see Scalaz). So if one is willing to make the code rather complicated, it is possible to implement Tarjan's SCC algorithm to a functional O(|E|) version.

Question 2

See Lazy Depth-First Search and Linear Graph Algorithms in Haskell by David King and John Launchbury. It describes many graph algorithms in a functional style, including SCC.

Question 3

Have a look at https://github.com/jordanlewis/data.union-find, a Clojure implementation of the algorithm. It's sorta disguised as a data structure, but the algorithm is all there. And it's purely functional, of course.