Using R or Matplotlib (Python), how can I create a venn diagram based on value comparisons for each row of a CSV file?

StackOverflow https://stackoverflow.com/questions/20326235

Domanda

I got a project which required making venn diagrams, and I am starting to learn python (using 2.7) so I figured trying to learn R as well would overload me with work. So, I learned online about matplotlib. Basically, what I need to do is create a venn diagram which compares the value under every column to generate a venn diagram. So if my csv had the following data:

Month    x    y
Sept    -1    1
Oct    0    1
Nov    1    1
Dec    -1    -1

The overlap would show a value of 2 (because nov and dec have the same value), the x circle by itself would show a value of 1, from sept (the 0 needs to be ignored), and the y circle would show a value of 2, from sept and oct.

I think this is a pretty complicated program, but I have no idea where to start, other than just starting with:

from matplotlib_venn import venn2

When I use an online tool such as Venny, it just finds the numbers that are similar between the lists instead of doing a row-by-row comparison. This results in nothing in values of 0 in the outer circles, and the number 3 in the overlap (because there are three different values in the input: -1, 0, and 1.

Alternatively, if this would instead be very easy using R instead of Python, could you help me with that?

Any help would be appreciated, thanks!

È stato utile?

Soluzione

In R there are many options to build venn diagram. You can get some of them

library(sos)
findFn('Venn diagramm')

For example using VennDiagram (the first in the list) you can get this graph. I used random values to generate it since it is not very clear how do you consider shared and intersection regions in your question.

enter image description here

library(VennDiagram)
# You should replace the random values here by your set of values
## 
set.seed(1)
A <- sample(1:100, 25, replace = FALSE)
B <- sample(1:100, 25, replace = FALSE)
C <- sample(1:100, 25, replace = FALSE)
D <- sample(1:100, 25, replace = FALSE)

venn.plot <- venn.diagram(
    x = list(
        Sept = A,
        Oct = D,
        Nov = B,
        Dec = C
    ),
    filename = NULL,
    col = "transparent",
    fill = c("cornflowerblue", "green", "yellow", "darkorchid1"),
    alpha = 0.50,
    label.col = c("orange", "white", "darkorchid4", "white", 
                                "white", "white", "white", "white", "darkblue", "white", 
                                "white", "white", "white", "darkgreen", "white"),
    cex = 1.5,
    fontfamily = "serif",
    fontface = "bold",
    cat.col = c("darkblue", "darkgreen", "orange", "darkorchid4"),
    cat.cex = 1.5,
    cat.pos = 0,
    cat.dist = 0.07,
    cat.fontfamily = "serif",
    rotation.degree = 270,
    margin = 0.2
)

grid.draw(venn.plot)

Altri suggerimenti

I don't fully understand your numbers but the matplotlib-venn package is pretty easy to use. In your example with (Xy, Yx, XY) = (1, 2, 2) from your text, you would just run

import matplotlib_venn as venn
v = venn.venn2((1, 2, 2))
v.get_label_by_id('A').set_text('x')
v.get_label_by_id('B').set_text('y')

The docs are here and it's reasonably straightforward. The trickier bit will be extracting the (Xy, Yx, XY) tuple from the data but I don't really understand your calculation, sorry. If you can explain it a bit more, perhaps I can offer some more advice there.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top