Question

I am looking for real world applications where topological sorting is performed on large graph sizes.

Some fields where I image you could find such instances would be bioinformatics, dependency resolution, databases, hardware design, data warehousing... but I hope some of you may have encountered or heard of any specific algorithms/projects/applications/datasets that require topsort.

Even if the data/project may not be publicly accessible any hints (and estimates on the order of magnitude of potential graph sizes) might be helpful.

Was it helpful?

Solution

Here are some examples I've seen so far for Topological Sorting:

  • While scheduling task graphs in a distributed system, it is usually needed to sort the tasks topologically and then assign them to resources. I am aware of task graphs containing more than 100,000 tasks to be sorted in a topological order. See this in this context.

  • Once upon a time I was working on a Document Management System. Each document on this system has some kind of precedence constraint to a set of other documents, e.g. its content type or field referencing. Then, the system should be able to generate an order of the documents with the preserved topological order. As I can remember, there were around 5,000,000 documents available two years ago !!!

  • In the field of social networking, there is famous query to know the largest friendship distance in the network. This problem needs to traverse the graph by a BFS approach, equal to the cost of a topological sorting. Consider the members of Facebook and find your answer.

If you need more real examples, do not hesitate to ask me. I have worked in lots of projects working on on large graphs.

P.S. for large DAG datasets, you may take a look at Stanford Large Network Dataset Collection and Graphics@ Illinois page.

OTHER TIPS

I'm not sure if this fits what you're looking for but did you know Bio4j project?

Not all the contents stored in the graph based DB would be adequate for topological sorting (there exists directed cycles in an important part of the graph), however there are sub-graphs like Gene Ontology and Taxonomy where this ordering may have sense.

TopoR is a commercial topological PCB router that works first by routing the PCB as topological problem and then translating the topology into physical space. They support up to 32 electrical layers, so it should be capable of many thousands of connections (say 10^4).

I suspect integrated circuits may use similar methods.

The company where I work manages a (proprietary) database of software vulnerabilities and patches. Patches are typically issued by a software vendor (like Microsoft, Adobe, etc.) at regular intervals, and "new and improved" patches "supercede" older ones, in the sense that if you apply the newer patch to a host then the old patch is no longer needed.

This gives rise to a DAG where each software patch is a node with arcs pointing to a node for each "superceding" patch. There are currently close to 10K nodes in the graph, and new patches are added every week.

Topological sorting is useful in this context to verify that the graph contains no cycles - if they do arise then it means that there was either an error in the addition of a new DB record, or corruption was introduced by botched data replication between DB instances.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top