Frage

I am a new R user and am having trouble graphing some data in a bar plot. Sorry in advance if this is really easy to do, and I just can’t figure it out. I have six sets of data: 3 data sets for car #1 at 1, 5, and 10yrs, and 3 data sets of car#2 at 1,5, and 10 yrs., where measurements for each car at each age would consist of 1.) counting the total number of dents on the cars exterior and 2.) number of dents that remove paint. I want to make a boxplot with 6 bars, corresponding to each car and their respective ages, where the column height is the total number of dents that remove paint, with standard deviation bars. Here’s what I’ve been trying so far (only 2 data sets included):

car1yr1 = c(rep(0, 101), rep(1, 9)) #car has 9 dents that remove paint

car1yr5 = c(rep(0, 131), rep(1, 19)) #car has 19 dents that remove paint

sd1 = sd(car1yr1)

sd2 = sd(car1yr5)

stdv = c(sd1, sd2)

car1yr1 = car1yr1[1:150]

dentsCar1 = data.frame("Car1Yr1" = car1yr1, "Car1Yr5" = car1yr5)

barplot(as.matrix(dentsCar1, ylim = c(0, 50), beside = TRUE))

I’ve found an example of error bars: arrows(bar, x, bar, x+ -(stdv), length = 0.15, angle = 90), but I can’t get this to work with my numbers. Also, in this example, the y-axis stops at 15, but the bars Car1Yr5 goes until 19. How can I draw a y-axis up to 20 or 30? Again, I’m new at R and any help would be greatly appreciated. I’ve been trying to solve this on my own off and on for about 2 weeks. Thanks.

War es hilfreich?

Lösung

I am a little confused by your data... I am assuming from your example that car 1 has 101 dents that did not remove paint and 9 that did and car 2 has 131 that did not and 19 that did.

Now calculating the standard deviation on the number of dents does not make much sense to me... you are plotting count data, so you should not have any standard deviation unless you have, say, many cars of the same model and you want to see the variability between cars.

The best thing to do would be to calculate the % of dents that removed paint by doing:

car1yr1 = c(rep(0, 101), rep(1, 9)) #car has 9 dents that remove paint
car1yr5 = c(rep(0, 131), rep(1, 19)) #car has 19 dents that remove paint

# The total number of observations is the total number of dents
total.dents.1 <- length(car1yr1)
total.dents.5 <- length(car1yr5)
# The dents that remove paint are marked as 1, the others with 0, 
# so we can just sum all of the data to get the number of paint-removing dents
dents.paint.1 <- sum(car1yr1)
dents.paint.5 <- sum(car1yr5)
# Alternatively you can use
# dents.paint.1 <- length(which(car1yr1==1))
# Calculate the %
dents.paint.perc.1 <- dents.paint.1/total.dents.1
dents.paint.perc.5 <- dents.paint.1/total.dents.5

df <- data.frame(dents.paint.perc.1, dents.paint.perc.5)

# Plot the data. 
# ylim specifies the limits of the y axis
# ylab defines the axis title. 
# las=1 puts the labels on the y axis horizontally
# names defines the labels on the x axis
barplot(as.matrix(df)*100, ylim=c(0,20), 
        ylab="% dents removing paint", las=1,
        names=c("Car 1 year 1", "Car 1 year 5"))

In general it would be much better to put all your data in a single list, so that you can use the *apply family of function to perform repetitive operations on all of your dataset. This will give you cleaner and more manageable code. Also, if you add more data it will automagically add it to the plot.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top