how to reduce the time complexity?
-
29-09-2020 - |
Question
I have a graph G=(V+E)
and list of list Node
where each sublist is subset of V
. I want to find out nodes of each sublist that are not reachable from other nodes of that sublist and add edge between those nodes. Suppose,Node
=[[1,2,3,4,5,7,8],[9,10],....] and in sublist=[1,2,3,4,5,7,8], {1,7,3,8} reachable from each other and {2,4,5} reachable from each other so an edge needs to be added between 1/7/3/8 and 2/4/5. I have used the following :
for each sublist in Node
SG=node_induced_subgraph(sublist) of G
C=connected_components(SG)
for each component c_i in C
add edge between c_i and c_{i+1}
end
end
complexity of node_induced_subgraph() is O(V+E) complexity of connected_components() is O(m+n) # m is no. of nodes and n is no. of edges in Sub-graph
So how to reduce the overall complexity?
Solution
I'm gonna address your question in two parts. First how to ensure reachability between nodes in one "sublist" with adding a minimal amount of edges (if you can add as many edges as you wanted you could go through the list and connect one element to its two neighboring elements - costing $O(n)$ edge addition operations, which is theoretically the fastest method...). If we want to add as few edges as possible we only want to connect the "connected components". I suppose your graph $G$ is stored as an adjacency list.
//reused arrays
visited := bool array of size |V| initialized to false
active := bool array of size |V| initialized to false
//simple dfs
def dfs(node):
if visited[node] or not active[node]: return
visited[node] = true
for child in neighbors(node):
dfs(child)
//algo
def connect_node_list(list):
for node in list: active[node] = true
last := null
for node in list:
if not visited[node]:
dfs(node)
if last != null: add_edge(last,node)
last = node
for node in list:
active[node] = false
visited[node] = false
Now this algo has a runtime of $O(n+|E|)$ where $n$ is the length of list
.
Well to be honest it's the same as your algo I just want to highlight that the usage of bool arrays is very desirable in general; also I don't know how your "indcued subgraph" method works - but you should avoid building a new graph.
Now an idea that might be new: you can reduce the number of sublists by connecting those that have an overlapping element. (Of course this is only useful if they aren't necessarily distinct.) You can use a union-find-structure for that:
uf := union find structure with |V| elements
for list in sublists:
first = list[0]
for node in list:
uf.unify(first, node)
index := array of length |V|
counter := 0
for element in uf:
index[element] = -1
if uf.parent(element) = element and uf.size(element) > 1:
index[element] = counter
counter++
new_sublists = list of <counter> many lists
for element in uf:
set = uf.find(element)
if index[set] != -1:
new_sublists[index[set]].add(element)
Now ´new_sublists´ only contains the necessary amount of sublists (also all sublists of size 1 are removed). The procedure takes $O(max(|V|,N\cdot\alpha (|V|)))$ where $N=\sum_i \text{|sublists[i]|}$. $\alpha(|V|)$ can basically be treated as a constant but you can read up on the union-find-structure article. If you are working with many overlapping node lists this algo should be executed first.