plot networks using two correlation matrices in igraph

https://stackoverflow.com/questions/20807012

22-09-2022
|

Question

I need to plot a network starting from two matrices, representing bacterial species and metabolites produced. I want a network showing the co-occurence and the correlation between bacteria and metabolites (NOT bateria VS bacteria and metabolites VS metabolites).

These are examples of my data:

> mydata_bac

Taxon                                                          CD1         CD2
Actinomycetaceae;g__Actinomyces                        0.072998825 0.031399459
Coriobacteriaceae;g__Atopobium                         0.040946468 0.002703265
Corynebacteriaceae;g__Corynebacterium                  0.002517201 0.006446247
Micrococcaceae;g__Rothia                               0.001174694 0.002703265
Porphyromonadaceae;g__Porphyromonas                    0.023326061 0.114368892
Prevotellaceae;g__Prevotella                           0.252894781 0.102308172
Flavobacteriaceae;g__Capnocytophaga                    0.001174694 0.029320025
Aerococcaceae;g__Abiotrophia                           0.002013761 0.003327095
Carnobacteriaceae;g__Granulicatella                    0.042960228 0.049490539
Gemellaceae;g__Gemella                                 0.027857023 0.067165731
Streptococcaceae;g__Streptococcus                      0.220506796 0.182782283
ClostridialesFamilyXI.IncertaeSedis;g__                0.000000000 0.000623830
ClostridialesFamilyXIII.IncertaeSedis;g__Mogibacterium 0.006880349 0.002495321
Lachnospiraceae;Other                                  0.000335627 0.000831774
Clostridia                                             0.004363148 0.002079434
Lachnospiraceae;g__Oribacterium                        0.003524081 0.002079434
Peptostreptococcaceae;g__Peptostreptococcus            0.000167813 0.005198586
Veillonellaceae;Other                                  0.001342507 0.001455604
Veillonellaceae;g__Veillonella                         0.047323376 0.082553545
Fusobacteriaceae;g__Fusobacterium                      0.009229737 0.010813059
Fusobacteriaceae;g__Leptotrichia                       0.092465179 0.076523186
Neisseriaceae;g__Neisseria                             0.013592885 0.027656477
Pasteurellaceae;g__Haemophilus                         0.014431952 0.092534831
SR1;c__;f__;g__                                        0.000000000 0.002079434
TM7;c__TM7-3;f__;g__                                   0.065782849 0.018299023
Erysipelotrichaceae;g__Bulleidia                       0.007551603 0.004366812
Bacteroidia                                            0.000000000 0.000415887
Porphyromonadaceae;g__Tannerella                       0.000671254 0.002079434
Flavobacteriaceae                                      0.002013761 0.001247661
Bacilli                                                0.002181574 0.002911208
Clostridia;f__;g__                                     0.000671254 0.002703265
ClostridialesFamilyXIII.IncertaeSedis;g__Eubacterium   0.003020641 0.002079434
Lachnospiraceae;g__Moryella                            0.003188454 0.000623830
Veillonellaceae;g__Selenomonas                         0.004866588 0.021834061
Fusobacteriaceae                                       0.000335627 0.001871491
Campylobacteraceae;g__Campylobacter                    0.001510321 0.001247661
Pasteurellaceae;g__Actinobacillus                      0.002852828 0.000207943
Burkholderiaceae;g__Lautropia                          0.000000000 0.002495321
Lactobacillaceae;g__Lactobacillus                      0.000000000 0.000000000
Staphylococcaceae;g__Staphylococcus                    0.000000000 0.000000000


> mydata_metab

Molecule CD1 CD2
1-chloro-decane 0.15 0.18
1-methyl-2-1-methylethyl-benzene 0.05 0.06
1-Octadecene 1.47 1.6
1.2.3-Trimethylbenzene 0.39 0.47
1.3-Bis1.1-dimethylethyl-benzene 0.24 0.06
1.3H-Isobenzofuranone 0.04 0.05
2-Butanone 0.01 0.03
2-ethyl-1-hexanol 1.44 1.34
2-Hexanol 0 0.02
2-Pentylfuran 0.42 0.56
2-pentylthiophene 8.08 7.21
25bis11dimethylethylphenol 0.15 0.27
2.6-dimethyl-4-heptanone 0.23 0
4-1.1-dimethylpropyl-phenol 0.39 0.3
4-1.1.3.3-tertamethylbutyl-phenol 1.6 2.04
4-Methoxybenzhydrol 0.02 0.02
4-methyl-1-pentene 0.02 0
4-methyl-2-hexanone 0.03 0.01
4-methyl-3-penten-2-one 0.67 0.27
aceticacidethylester 0.5 0.7
Benzene 0.13 0.1
Benzene-octyloxy 0.03 0.03
butanoicacid-2-methyloctylester 0.04 0.03
carbondisulfide 3.05 3.72
Caryophyllene 0.02 0.02
ethyl-alcohol 0.45 0.61
hexadecane 0.02 0.02
menthol 0.08 0.06
methyl-isobutyl-ketone 0.09 0.08
n.d.3-ion97 1.99 1.92
n.d.4-ion102 2.12 2.04
Nonadecane 0 0
Nonanal 0.82 0.71
Octanal 0.70 0.71
Pentadecane 0.03 0.03
phenol 0.12 0.09
Propyl-alcohol 0.20 0.22
Tetradecane 0.12 0.09
Toluene 0.25 0.15
Trichloromethane 0.15 0.12

This is what I tried:

> library(psych)
> library(igraph)

> mydata_bac <- read.csv(file="L5_filt.txt", header=T, row.names=1, sep="\t")
> mydata_metab <- read.csv(file="metab.txt", header=T, row.names=1, sep="\t")
> mydata_t <- t(as.matrix(mydata_bac))
> metab_t <- t(as.matrix(mydata_metab))
> cor.matrix <- corr.test(mydata_t,metab_t, method = "spearman")
> graph.f<-graph.adjacency(cor.matrix$r, weighted=TRUE, mode="upper")
> t.names <- colnames(cor.matrix)[as.numeric(V(graph.f)$name)]

> graph.f = simplify(graph.f)
> plot (graph.f, vertex.size=5, vertex.shape="circle", vertex.label.color="red", vertex.label=t.names, vertex.label.cex=0.9, edge.width=1, layout=layout.fruchterman.reingold)

But I got a network where there were only the metabolites... Where the error is?

Solution

Ok, so I think what has happened here, is the number of metabolites you have, just happens to be the same as the number of taxa you have. Otherwise graph.adjacency would throw an error: It wants a square matrix of adjacencies, the rows AND the columns of cor.matrix$r should contain all the nodes in the graph.

Your graph is bipartite: Metabolites can be connected to taxa, and taxa can be connected to metabolites, but metabolites are never connected to other metabolites, and taxa are never connected to other taxa. When you run corr.test, your rows are your taxa, and your columns are your metabolites.

I am not familiar with the igraph package: there might be a more suitable method than graph.adjacency for handling bipartite graphs. In lieu of that, we can force our adjacency matrix, cor.matrix, to contain all nodes:

Run this immediately after you obtain cor.matrix:

nNodes <- ncol(cor.matrix$r) + nrow(cor.matrix$r)
nodeNames <- c(colnames(cor.matrix$r), rownames(cor.matrix$r))
adj <- matrix(0, nNodes, nNodes, dimnames=list(nodeNames, nodeNames))
adj[(ncol(cor.matrix$r)+1):nrow(adj), 1:ncol(cor.matrix$r)] <- cor.matrix$r
adj[1:ncol(cor.matrix$r), (ncol(cor.matrix$r)+1):ncol(adj)] <- t(cor.matrix$r)

Now you can obtain graph.f from adj, and run the rest of your code:

graph.f <- graph.adjacency(adj, weighted=TRUE, mode="upper")

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow