Question

I have a data frame with donations and names of donors.

**donation**              **Donor**
 25.00               Steve Smith
 20.00               Jack Johnson
 50.00               Mary Jackson
  ...                   ...

I'm trying to do some clustering using the pvclust package. Unfortunately the package doesn't seem to take non-numerical data.

> rs1.pv1 <- parPvclust(cl, rs1, nboot=10)
Error in cor(x, method = "pearson", use = use.cor) : 'x' must be numeric

I have two questions.

1) Is there another package or method that would do this better?

2) Is there a way to "normalize" the donor names list? Ie get a list of unique donor names, assign each an id number and then insert the id number into the data frame in place of the character name.

Was it helpful?

Solution

For number 2:

#If donor is a factor then

as.numeric(donor)

#will transform your factor to numeric.
#If it isn't, tranform it to a factor and the to numeric
as.numeric(as.factor(donor))

However, I'm not sure that transforming the donor list to a numeric and then using cor makes sense at all.

HTH

OTHER TIPS

How about rs1 <- transform(rs1, Donor=as.numeric(factor(Donor))) ? (Warning: I haven't thought about what you're doing enough to know whether that makes sense -- so I'm only answering question #2, not question #1). Typically Donor would already be a factor (this is what e.g. read.table or read.csv would do by default), so the factor() part would be redundant.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top