Algorithm to split bipartite graph into subgraphs
-
29-09-2020 - |
Question
I'm looking for an algorithm to split a bipartite graph into subgraphs with a specific constraint. I'm not sure if any existing algorithms solve my problem or not.
I have an undirected bipartite graph where the nodes are customers ($C$) and services ($S$). I want to split this into several smaller subgraphs, limiting the amount of services in each subgraph to some maximum number $N$. Unfortunately, looking for disconnected subgraphs is not sufficient because the graph connectivity is too high, so I think I will need to duplicate services.
Formally, I want a set of subgraphs such that:
- Each customer $c \in C$ appears in exactly one subgraph
- All edges appear in exactly one subgraph (the one in which their customer appears)
- Each service $s \in S$ can appear in any number of subgraphs (it's okay to duplicate services to help the split)
- Each subgraph should have at most $N$ services (where $N$ is a given constant that is guaranteed to be larger than the highest number of services connected to any single customer)
- The subgraphs should have as many customers as possible (without this restriction, it's trivially solvable by putting each customer in their own subgraph with a copy of their services). That can be a heuristic rather than formally proven.
Can anyone suggest an algorithm for doing this? The number of nodes is not huge (roughly 1000 customers, 100 services, with each customer connecting to 5 or less services) so brute force approaches or those with bad big-O scaling may be suitable.
Solution
It smells like it might be NP-hard.
One plausible approach is to use a SAT solver or ILP solver. Suppose you decide you want to have at most $m$ subgraphs. Then you can have boolean variables $x_{i,k},y_{j,k}$ indicating that customer $i$ goes into subgraph $k$ and service $j$ goes into subgraph $k$. You can obtain a bunch of constraints (clauses) based on your requirements and then ask the SAT solver or ILP solver to find a feasible solution. The worst-case running time is exponential. This might or might not work well enough in your situation.