Question

I have been using both k-means and Fuzzy c means for a few days now on a tricky data set, its yielding okish results but I want to visualize and manipulate the graphical outputs and I found a fantastic visual tool Gephi. If you click on the picture on the main page it will load a video that you can watch.

On gephis supported graph formats page here they have a list of possible import formats:

* GEXF
* GDF
* GML
* GraphML
* Pajek NET
* GraphViz DOT
* CSV
* UCINET DL
* Tulip TPL
* Netdraw VNA
* Spreadsheet

Looking at matlab the format I could output my cluster data could be in csv. On gehpis site here they explain the formats, edge list, mixed, matrix.

Im not really sure what they mean. Using FCM in matlab I get 3 outputs centers, U and objFun.

[centers, U, objFun] = fcm(data, clusters, options);

So my question is how can I build CSV files from this data in the format that they require.

https://gephi.org/users/supported-graph-formats/spreadsheet/

http://forum.gephi.org/viewtopic.php?t=1896

I will reward anyone who can help with 100 points with a bounty, as this visualization tool is what I want to use from now on and as of yet there isnt any questions on stack which explain how this can be done. So it may be useful for the future and the community for gephi/matlab users.

Was it helpful?

Solution

The issue here is that you need to be able to represent your data as a graph. Even if your data is not a graph, it can still be represented as one for visualization. You need to identify what in your data can represent nodes and what can represent edges. Once you do that, writing the data out to a file that can be imported by Gephi (or other graph/network visualization tools) is fairly straight forward. Since you have not posted an example of your data it is difficult to suggest how this can be done.

Ask yourself the following questions about your data:

  1. What can be represented as a node?
  2. What can be represented as an edge to link the nodes defined in #1?

Each node must have a unique identifier associated with it (this can be a simple numerical value or string).

This is the difficult part because representing your cluster data as a graph, if done incorrectly, can provide misleading interpretation from visualization.

Once you have this accomplished, the easiest way to get it into a file format is an edge list.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top