Question

I have problem to convert a classic input data with reshape

My input data :

   df <- read.table(textConnection(" Ville POP1999 POP2010 PARC1999 PARC2010
    1 Paris 1800000 2200000 150 253
    2 Itxassou 1000 1800 0 NA
    "))

with result in this data.frame :

     Ville   POP1999 POP2010 PARC1999 PARC2010
1    Paris 1800000 2200000    150      253
2 Itxassou    1000    1800      0       NA

I have this type of input, and i want to use colsplit (reshape2 package) with regex to cut my dataframe like this :

     Ville    Date    Population Parc 
1    Paris    1999    1800000    150
2    Paris    2010    2200000    253
3    Itxassou 1999    1000       0
4    Itxassou 2010    1800       NA

Do you think it's possible to make this in one line with reshape 1 or 2 and colsplit function ?

My id equal "Ville" + "Date", so i think it's difficult to cut first with colsplit, and after that re-use the result id colum with meld :/

Do you have an idea of answer ?

Update 1 :

I add some difficulty to this problem, imagine now we have thousand of column, and column are mixed. I try to use grep and reshape, but no result at this time.. (see comments on @kohske great answer)

Update 2 :

@kohske resolve the problem with adding this code :

cn <- grep("*[0-9]",names(df),value="TRUE")
reshape(df, varying =  cn, direction = "long", sep = "")
Was it helpful?

Solution

you can use stats::reshape:

> reshape(df, 2:5, direction = "long", sep = "")
          Ville time     POP PARC id
1.1999    Paris 1999 1800000  150  1
2.1999 Itxassou 1999    1000    0  2
1.2010    Paris 2010 2200000  253  1
2.2010 Itxassou 2010    1800   NA  2

OTHER TIPS

Here is a pure reshape2 solution:

library("reshape2")
library("stringr")

df2 <- melt(df, id.var=c("Ville"))
df2 <- cbind(df2, 
             colsplit(df2$variable, pattern=perl("(?=\\d)"), c("var", "Date")))
dcast(df2, Ville + Date ~ var)

The tricky part is the (Perl) regular expression which is a lookahead for a digit. The variable (which was column headings), is split before the first digit. The results of this are

     Ville Date PARC     POP
1 Itxassou 1999    0    1000
2 Itxassou 2010   NA    1800
3    Paris 1999  150 1800000
4    Paris 2010  253 2200000

You can rename the PARC and POP columns; those names come from the original column names.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top