Question

So here's my problem:

I have a bunch of data about sound production and where the emphasis falls in a word. What I'm trying to do is determine if the difference between production on stressed and unstressed syllables is significant. The problem is that when I try to use the cor() function, the data sets aren't the same length. I have about 500 instances of stressed syllables, but only 400 of unstressed syllables. I'm very new to r, but here's the code I've attempted:

data <- read.csv('D:/blaaah/Stressed.csv', header=TRUE)
var1 <- data$intdiff
data <- read.csv('D:/blaaah/Unstressed.csv', header=TRUE)
var2 <- data$intdiff
cor(var1, var2)

Of course, I get an error because the data sets are different lengths. So how do I check for significance between the sets without having them be the same length?

Thanks a bunch!

P.S. Just ask if my question isn't clear. I'm afraid I sometimes assume everyone knows what I'm doing...

Was it helpful?

Solution

Using cor() would be appropriate if you expected there to be a relationship between var1 and var2, for instance if you'd expect the value of an item in var2 to be larger if the corresponding item in var1 is larger. There is a difficulty when the data sets are not the same length, because there are no corresponding items to compare once you get past the end of the shorter dataset.

I think, in this case, that a comparison of the two data sets to establish if their means are different is more likely to be useful to you. For that, you'd want to use a t test, as described, with examples in R, here. You'd also want to confirm that the assumptions for using the t test are valid for this case, e.g. see here.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top