Question

I have a dataset with three variables. One continous independent variable, one continous dependent variable, and a binary variable that catagorizes how the measurements were taken. Using ggplot, I know that I can make a scatter plot with the points colored by the catagory:

g <- ggplot(dataset, aes(independent, dependent))
g + geom_point(aes(color=catagory))

However, I want to know if there is a way to make a graph where there is a vertical line comming up from points of catagory 0 and a vertical line going down from points of catagory 1. It would look something like this:

-   |        |    |
|   |        |    |
|   |        |    |
|   |        |    |
-   |        |  o |
|   |        |  | |
|   |    o   |  | |
|   | o  |   |  | |
-   | |  |   o  | o
|   | |  |      |
|   o |  |      |
|     |  |      |
+----|-----|-----|-----|-----|

The reason for wanting a plot like this is that one category represents an upper bound (the points with lines going downwards) and one represents a lower bound (the points with lines going upwards). Having these lines would make it easy to visualize the area which is between these bounds, and whether a function plotted on top could accurately represent the data:

-   |        |    |
|   |        |    |
|   |        |    |
|   |        |    |
-   |        |  o |   _____
|   |        |  |_|__/
|   |    o   |_/| |
|   | o  |__/|  | |
-   | | /|   o  | o
|  _|_|/ |      |
| / o |  |      |
|/    |  |      |
+----|-----|-----|-----|-----|

If there is any way to do this using ggplot or any other graphing library for R, I would love to know how. However, if it isn't possible, I'd be open to hearing other ways to represent this data. Simply distinguishing the catagories based on color doesn't do enough to emphasize the upper/lower bound nature of the catagories for my purposes.

Was it helpful?

Solution

The following could work for you, I hope I understood the problem well.

First, generating some random data for the dataframe, as no sample data was provided. The random numbers will make the plot ugly, I hope it will look better with real data:

dataset <- data.frame (
    independent = runif(100),
    dependent = runif(100),
    catagory = floor(runif(100)*2))

Next, find the upper or lower part of the plot (=min/max of values) based on "catagory" for every case:

dataset$end[which(dataset$catagory == 0)] <- max(dataset$dependent)
dataset$end[which(dataset$catagory == 1)] <- min(dataset$dependent)

Now, we can plot data with geom_segment().

g <- ggplot(dataset, aes(independent, dependent, min, max))
g + geom_segment(aes(x=independent, y=dependent, xend=independent, yend=end, color=catagory)) 

alt text

Note, that I also added + theme_bw() + opts(legend.position = "none") parameters to the plot as it looked very strange with random datas.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top