Question

I am new to R and I have been trying to pivot a data frame I read from a CSV file. The original CSV contain 5,000 item numbers, in my sample I used the first five. My end result using pivot should present each item number as many times as the payments done and the payments type. For example, the original table look like this:

ITEM NUMBER P1      P2      P3      P4      PType1  PType2      PType3  PType4
697884      270     255     170     0       CASH    CA      VI  
697885      100     1160    310     580     CASH    AX      VI          CA
697886      1515    1455    1765    970     CASH    AX      VI          CA
697887      0       0       0       0               
697888      1755    3610    1950    0       AX          VI      CA

By using pivot I want to get a table like this:

ITEM NUMBER Payment    PaymentType  
697884           270         CASH
697884           255         CA
697884           170         VI

...(The next item)

My current Data Frame contain 9 variables where the item number is a NUM, the payment amount is int and the payment type is Factor. Thank you!

structure(list(ITEM.NUMBER = 697884:697888, Payment1 = c(270L, 
100L, 1515L, 0L, 1755L), Payment2 = c(255L, 1160L, 1455L, 0L, 
3610L), Payment3 = c(170L, 310L, 1765L, 0L, 1950L), Payment4 = c(0L, 
580L, 970L, 0L, 0L), PaymentType1 = structure(c(3L, 3L, 3L, 1L, 
2L), .Label = c("", "AX", "CASH"), class = "factor"), PaymentType2 = structure(c(3L, 
2L, 2L, 1L, 4L), .Label = c("", "AX", "CA", "VI"), class = "factor"), 
    PaymentType3 = structure(c(3L, 3L, 3L, 1L, 2L), .Label = c("", 
    "CA", "VI"), class = "factor"), PaymentType4 = structure(c(1L, 
    2L, 2L, 1L, 1L), .Label = c("", "CA"), class = "factor")), .Names = c("ITEM.NUMBER", 
"Payment1", "Payment2", "Payment3", "Payment4", "PaymentType1", 
"PaymentType2", "PaymentType3", "PaymentType4"), row.names = c(NA, 
-5L), class = "data.frame")
Was it helpful?

Solution

You can use reshape from base R. Assuming your data are called "mydf":

reshape(mydf, direction = "long", idvar="ITEM.NUMBER", 
        varying=2:ncol(mydf), sep = "")
#          ITEM.NUMBER time Payment PaymentType
# 697884.1      697884    1     270        CASH
# 697885.1      697885    1     100        CASH
# 697886.1      697886    1    1515        CASH
# 697887.1      697887    1       0            
# 697888.1      697888    1    1755          AX
# 697884.2      697884    2     255          CA
# 697885.2      697885    2    1160          AX
# 697886.2      697886    2    1455          AX
# 697887.2      697887    2       0            
# 697888.2      697888    2    3610          VI
# 697884.3      697884    3     170          VI
# 697885.3      697885    3     310          VI
# 697886.3      697886    3    1765          VI
# 697887.3      697887    3       0            
# 697888.3      697888    3    1950          CA
# 697884.4      697884    4       0            
# 697885.4      697885    4     580          CA
# 697886.4      697886    4     970          CA
# 697887.4      697887    4       0            
# 697888.4      697888    4       0

If you want to order by the "ITEM.NUMBER", you can use order:

out <- reshape(mydf, direction = "long", idvar="ITEM.NUMBER", 
               varying=2:ncol(mydf), sep = "")
out[order(out$ITEM.NUMBER), ]

Update

For completeness, here's the reshape2 approach that I came up with:

First, melt the data (as indicated in the comments):

mydfL <- melt(mydf, id.vars="ITEM.NUMBER")
head(mydfL)
#   ITEM.NUMBER variable value
# 1      697884 Payment1   270
# 2      697885 Payment1   100
# 3      697886 Payment1  1515
# 4      697887 Payment1     0
# 5      697888 Payment1  1755
# 6      697884 Payment2   255

Second, split up the "variable" column. There might be better ways to do this, but this is what came to my mind.

mydfL <- cbind(mydfL, do.call(rbind, strsplit(
  as.character(mydfL$variable), split = "(?<=[a-zA-Z])(?=[0-9])", perl = T)))
head(mydfL)
#   ITEM.NUMBER variable value       1 2
# 1      697884 Payment1   270 Payment 1
# 2      697885 Payment1   100 Payment 1
# 3      697886 Payment1  1515 Payment 1
# 4      697887 Payment1     0 Payment 1
# 5      697888 Payment1  1755 Payment 1
# 6      697884 Payment2   255 Payment 2

Third, use dcast to get your output. Since some columns are named "1" and "2", you'll need to use backticks (`) to quote them and get R to recognize them as column names and not values.

dcast(mydfL, ITEM.NUMBER + `2` ~ `1`, value.var="value")
#    ITEM.NUMBER 2 Payment PaymentType
# 1       697884 1     270        CASH
# 2       697884 2     255          CA
# 3       697884 3     170          VI
# 4       697884 4       0            
# 5       697885 1     100        CASH
# 6       697885 2    1160          AX
# 7       697885 3     310          VI
# 8       697885 4     580          CA
# 9       697886 1    1515        CASH
# 10      697886 2    1455          AX
# 11      697886 3    1765          VI
# 12      697886 4     970          CA
# 13      697887 1       0            
# 14      697887 2       0            
# 15      697887 3       0            
# 16      697887 4       0            
# 17      697888 1    1755          AX
# 18      697888 2    3610          VI
# 19      697888 3    1950          CA
# 20      697888 4       0            
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top