Question the way graphs are taught

https://stackoverflow.com/questions/4509270

12-10-2019
|

سؤال

I'm from Argentina, but i think everybody who has ever take a Data Structures class know what a graph is. If you do, you might know what kind of implementations are "common" or "standar". It can be implemented through a List, or an array. Even Wikipedia says this. As well as Mark Allen Weiss, Bruno Preiss and Luis Joyanes Aguilar.

The thing is. No one ever has think that this is not a good way to do it? The most recommended way is through a List. But considered that vertices can have just one edge between them, i don't think that a List is the good interface to do this. I mean, if the Vertex V1 is connected with Vertex V2, then there is just one and only one edge.

Don't you think it would be a Set instead of a list?

Class Vertex{
    private Set edges;
    private Object data;

    /** Methods**/
}

Just want to know some opinions, what do you think?

Thanks!!

Edit: Also, if we think that the Graph can't have repeated elements, a HashSet would be a good choice to minimize the lookup of the vertex in the insertion.

المحلول

You're right to point out that the adjacencies for a vertex are most accurately modelled by a set (or in the case of a multigraph, a multiset). So why do data structures books write about arrays and linked lists instead? I can think of three reasons:

The idea that programming languages should include sets as a primitive data type is fairly recent. Older writers wouldn't have considered using it, and modern writers tend to follow the traditions of the field.
One of the purposes of a data structures course is to enable you to think about the representation of data at a low (concrete) level as well as at a high (abstract) level. A set is an abstract datatype that (unlike linked lists and arrays) doesn't have an obvious low-level implementation: some sets are best represented as linked lists, some as hash tables, some as arrays, and so on. So it is natural for a data structures course to skip over the high level representation of sets to their low-level implementation, which you have to know about anyway in order to analyse the behaviour of algorithms that use them.
It's important not to be dogmatic about how to represent datatypes, because algorithms can be most efficiently expressed using particular representations. Example 1. To count the paths of length n between each pair of vertices in a graph, represent the graph by its adjacency matrix and raise the matrix to the power n. If you insist on representing the adjacencies of a vertex as a set of edges, then you'll miss this algorithm (which can be parallelized using standard techniques). Example 2. Knuth's "Dancing Links" algorithm for the exact cover problem represents sets of columns using doubly linked lists, so that the links from deleted items can be reused for efficient backtracking.

نصائح أخرى

At a relatively higher C/C++ programmer level, how a graph/network is implemented depends quite heavily on what operations are performed on it. Being an OR person myself, I am probably biased in my response/example here. Some of the most efficient algorithms that can be implemented on graphs/networks are polynomial-time algorithms. Most, if not all, polynomial-time algorithms I can think of (Dijkstra's shortest path problem, transportation problem, max-flow problem, etc.) are special cases of the minimum-cost-flow (MCF) problem. Computationally, one of the most efficient ways of solving an MCF problem is via the network-simplex algorithm (which in itself is a specialization of the simplex algorithm to solve a general linear program).

In the network-simplex-algorithm, a spanning tree (over the set of nodes) needs to be efficiently represented. To represent a spanning tree in a graph, a variety of data structures can be used. These include the following node-length

predecessor[], thread[] and depth[] arrays.

In the most efficient implementations of network-simplex-algorithms that I have come across, these are not represented as arrays but some sort of dynamically created block of memory via

calloc(number_of_nodes, sizeof(struct vertex));

I am unsure (at a relatively lower level) internal to the compiler what/how this memory allocation is implemented - whether it is a list/set/map.

So, in summary, the best way to implement a graph is highly dependent on the operations to be performed on it.

Network simplex algorithm and the data structures needed to efficiently implement the same can be found in this book.

At its most abstract, a Set has a predicate to test whether an element is within the set. It may also support operators which provide union and intersection. Difference is not necessary computable.

At its most abstract, a List supports iteration, sub-list, and append.

Most algorithms on graphs require you to iterate over the edges, so a structure which supports iteration is preferred. Most algorithms do not attempt to add the same edge twice, so removal of duplicates is not required.

Of course, most sets in libraries are finite, extensional sets which also support iteration, so you could use them, though you would still have the cost of checking for duplicates.

Some graph based systems do use sets as the underlying mechanism, but they are dealing with infinite graphs rather than finite ones, where intensional sets become useful.

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow