题
我有以下格式的数据框架:
符号日期时间利润
$ Banknifty 4/1/2010 9:55:00 -1.18%u003Cbr> $ Banknifty 4/1/2010 12:30:00 -2.84%
$ Banknifty 4/1/2010 12:45:00 7.17%u003Cbr> $ Banknifty 5/1/2010 11:40:00 -7.11%
Zeel 26/6/2012 13:50:00 24.75%
Zeel 27/6/2012 15:15:00 -1.90%
Zeel 28/6/2012 9:45:00 37.58%
Zeel 28/6/2012 14:55:00 23.95%
Zeel 29/6/2012 14:20:00 -4.65%
Zeel 29/6/2012 14:30:00 -6.01%
Zeel 29/6/2012 14:55:00 -12.23%
Zeel 29/6/2012 15:15:00 35.13%
我想实现的是将该数据框架转换为数据框,该数据框架的日期为行名,列的符号名称和每天的利润百分比总和。如下所示:
日期银行Zeel
4/1/2010 3.15% 0
5/1/2010 -7.11% 0
26/6/2012 0 24.75%
27/6/2012 0 -1.90%
28/6/2012 0 61.53%
29/6/2012 0 12.24%
我该如何在R中实现? dplyr
突变还是某些应用功能?
我是R编程的初学者。提前致谢。
R中的数据为
structure(list(Symbol = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L), .Label = c("BANKNIFTY", "ZEEL"), class = "factor"),
Date = structure(c(5L, 5L, 5L, 6L, 1L, 2L, 3L, 3L, 4L, 4L,
4L, 4L), .Label = c("26/6/2012", "27/6/2012", "28/6/2012",
"29/6/2012", "4/1/2010", "5/1/2010"), class = "factor"),
Time = structure(c(10L, 2L, 3L, 1L, 4L, 8L, 9L, 7L, 5L, 6L,
7L, 8L), .Label = c("11:40:00", "12:30:00", "12:45:00", "13:50:00",
"14:20:00", "14:30:00", "14:55:00", "15:15:00", "9:45:00",
"9:55:00"), class = "factor"), Profit = structure(c(1L, 4L,
12L, 7L, 9L, 2L, 11L, 8L, 5L, 6L, 3L, 10L), .Label = c("-1.18%",
"-1.90%", "-12.23%", "-2.84%", "-4.65%", "-6.01%", "-7.11%",
"23.95%", "24.75%", "35.13%", "37.58%", "7.17%"), class = "factor")), .Names = c("Symbol",
"Date", "Time", "Profit"), class = "data.frame", row.names = c(NA,
-12L))
解决方案
最快的方法是
require(data.table)
data <- data.table(data)
# Remove the percentage from your file and convert the field to numeric.
data[, Profit := as.numeric(gsub("%", "", Profit))]
data
## Symbol Date Time Profit
## 1: BANKNIFTY 4/1/2010 9:55:00 -1.18
## 2: BANKNIFTY 4/1/2010 12:30:00 -2.84
## 3: BANKNIFTY 4/1/2010 12:45:00 7.17
## 4: BANKNIFTY 5/1/2010 11:40:00 -7.11
## 5: ZEEL 26/6/2012 13:50:00 24.75
## 6: ZEEL 27/6/2012 15:15:00 -1.90
## 7: ZEEL 28/6/2012 9:45:00 37.58
## 8: ZEEL 28/6/2012 14:55:00 23.95
## 9: ZEEL 29/6/2012 14:20:00 -4.65
## 10: ZEEL 29/6/2012 14:30:00 -6.01
## 11: ZEEL 29/6/2012 14:55:00 -12.23
## 12: ZEEL 29/6/2012 15:15:00 35.13
# Melt the data so that we can easily dcast afterwards.
molten_data <- melt(data[, list(Symbol, Date, Profit)]
# Create a summary by date and Symbol.
dcast(molten_data, id = c("Symbol", "Date")), Date ~ variable + Symbol, fun = sum)
## Date Profit_BANKNIFTY Profit_ZEEL
## 1: 26/6/2012 0.00 24.75
## 2: 27/6/2012 0.00 -1.90
## 3: 28/6/2012 0.00 61.53
## 4: 29/6/2012 0.00 12.24
## 5: 4/1/2010 3.15 0.00
## 6: 5/1/2010 -7.11 0.00