Re-use a colsplit regex cutting column with meld and reshape package
题
I have problem to convert a classic input data with reshape
My input data :
df <- read.table(textConnection(" Ville POP1999 POP2010 PARC1999 PARC2010
1 Paris 1800000 2200000 150 253
2 Itxassou 1000 1800 0 NA
"))
with result in this data.frame :
Ville POP1999 POP2010 PARC1999 PARC2010
1 Paris 1800000 2200000 150 253
2 Itxassou 1000 1800 0 NA
I have this type of input, and i want to use colsplit (reshape2 package) with regex to cut my dataframe like this :
Ville Date Population Parc
1 Paris 1999 1800000 150
2 Paris 2010 2200000 253
3 Itxassou 1999 1000 0
4 Itxassou 2010 1800 NA
Do you think it's possible to make this in one line with reshape 1 or 2 and colsplit function ?
My id equal "Ville" + "Date", so i think it's difficult to cut first with colsplit, and after that re-use the result id colum with meld :/
Do you have an idea of answer ?
Update 1 :
I add some difficulty to this problem, imagine now we have thousand of column, and column are mixed. I try to use grep and reshape, but no result at this time.. (see comments on @kohske great answer)
Update 2 :
@kohske resolve the problem with adding this code :
cn <- grep("*[0-9]",names(df),value="TRUE")
reshape(df, varying = cn, direction = "long", sep = "")
解决方案
you can use stats::reshape
:
> reshape(df, 2:5, direction = "long", sep = "")
Ville time POP PARC id
1.1999 Paris 1999 1800000 150 1
2.1999 Itxassou 1999 1000 0 2
1.2010 Paris 2010 2200000 253 1
2.2010 Itxassou 2010 1800 NA 2
其他提示
Here is a pure reshape2
solution:
library("reshape2")
library("stringr")
df2 <- melt(df, id.var=c("Ville"))
df2 <- cbind(df2,
colsplit(df2$variable, pattern=perl("(?=\\d)"), c("var", "Date")))
dcast(df2, Ville + Date ~ var)
The tricky part is the (Perl) regular expression which is a lookahead for a digit. The variable
(which was column headings), is split before the first digit. The results of this are
Ville Date PARC POP
1 Itxassou 1999 0 1000
2 Itxassou 2010 NA 1800
3 Paris 1999 150 1800000
4 Paris 2010 253 2200000
You can rename the PARC
and POP
columns; those names come from the original column names.