Question

I used to fiddle with R and now it all seems to have escaped me . . .

I have a table with a few hundred columns and about 100k rows. One of those columns contains strings that sometimes have commas in them (e.g. chicken,goat,cow or just chicken). I need a script with a (I believe) for loop that can create a new column (I know the new column code should not be in the for loop), count the number of commas (or the number of entries in the column in question less one) and add one so I can find out how many entries are in each column. An example:

col
chicken
chicken,goat
cow,chicken,goat
cow

I want a script to turn create an additional column in the table that would look like . . .

col2
1
2
3
1
Était-ce utile?

La solution 2

A loop is not needed here, I think. Using the stringr package...

require(stringr)
dat$aninum <- sapply(dat$ani,str_count,pattern=',')+1

which gives

               ani aninum
1          chicken      1
2     chicken,goat      2
3 cow,chicken,goat      3
4              cow      1

Autres conseils

I would use count.fields (from base R):

mydf$col2 <- count.fields(file = textConnection(as.character(mydf$col)), 
                          sep = ",")
mydf
#                col col2
# 1          chicken    1
# 2     chicken,goat    2
# 3 cow,chicken,goat    3
# 4              cow    1

Update: Accounting for blank lines

count.fields has a logical argument blank.lines.skip. So, to capture information for empty lines, just set that to TRUE.

Example:

mydf <- data.frame(col = c("chicken", "", "chicken,goat", "cow,chicken,goat", "cow"))

count.fields(file = textConnection(as.character(mydf$col)), 
             sep = ",", blank.lines.skip=FALSE)
# [1] 1 0 2 3 1

You could use ?strsplit:

df <- data.frame(col=c("chicken", "chicken,goat", "cow,chicken,goat", "cow"), stringsAsFactors=FALSE)
df$col2 <- sapply(strsplit(df$col, ","), length)
df
#                col col2
# 1          chicken    1
# 2     chicken,goat    2
# 3 cow,chicken,goat    3
# 4              cow    1
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top