Domanda

Suppose I have the following data frame:

tmp <- data.frame(
code = c("11","111","112"),
label = c("sector a","industry a1","industry a2"),
sector = c("11","11","11"),
industry = c("NA","111","112")
)

such that:

> tmp
  code       label sector industry
1   11    sector a     11       NA
2  111 industry a1     11      111
3  112 industry a2     11      112

I want to create a variable with the label for the sector. In this simple example all industries are in same sector so

> tmp$sector.alpha <- c(rep("sector a",3))

works to generate:

> tmp
  code       label sector industry sector.alpha
1   11    sector a     11       NA     sector a
2  111 industry a1     11      111     sector a
3  112 industry a2     11      112     sector a

but suppose a more complicated example where there are two or more sectors, with any number of industries per sector.

How do I generate the correct labels?

È stato utile?

Soluzione

For example:

 ddply(tmp,.(sector),transform,sector.alpha=label[1])
  code       label sector industry sector.alpha
1   11    sector a     11       NA     sector a
2  111 industry a1     11      111     sector a
3  112 industry a2     11      112     sector a

Changing a little bit your data to introduce more sectors:

tmp <- data.frame(
  code = c("11","111","112","121"),
  label = c("sector a","industry a1","industry a2","indstry 14"),
  sector = c("11","11","12","12"),
  industry = c("NA","111","112","212")
)

library(plyr)
ddply(tmp,.(sector),transform,sector.alpha=label[1])

 code       label sector industry sector.alpha
1   11    sector a     11       NA     sector a
2  111 industry a1     11      111     sector a
3  112 industry a2     12      112  industry a2
4  121  indstry 14     12      212  industry a2

Altri suggerimenti

A numeric variable can be converted into categorical variable with multiple categories using cut command. Use ?cut for detail of the command. Lets try the following codes.

x<-sample(0:100,10) #Generates random data between 0 and 100 of size 10

cat<-cut(x,breaks=c(0,40,50,60,70,80,100),labels=c("a","b","c","d","e","f"))

cut command breaks your desired variable and labels the corresponding class intervals defined in the breaks. That might help. You can do the same for the data frame

x<-sample(0:100,10)
y<-sample(200:300,10)
dat<-data.frame(x,y)
dat$cat<-cut(x,breaks=c(0,40,50,60,70,80,100),labels=c("a","b","c","d","e","f"))

This also works:

tmp$sector.a <- tmp[match(tmp$sector,tmp$code),"label"]
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top