sometimes ago I asked the following question:
I have a list of deals with trading day and market value. Every
(Trading)day new positions come into the list but the old one never
disappear (when positions expire the value stays just constant). The
list looks like as follows:
Deal Trade_Date MktValue Desired_Col
Deal1 31.08.2012 10 +10
Deal2 31.08.2012 21 +21
Deal1 03.09.2012 12 +2
Deal2 03.09.2012 19 -2
Deal3 03.09.2012 2 +2
I would like for each deal to get the difference to the previous trade
date (Desidered_Col in the above example).
And the following solution was provided to me by Roland:
df <- read.table(text="Deal Trade_Date MktValue Desidered_Col Deal1
31.08.2012 10 +10 Deal2 31.08.2012 21 +21 Deal1 03.09.2012 12 +2 Deal2 03.09.2012 19 -2 Deal3 03.09.2012 2 +2",header=TRUE)
library(data.table) dt <- as.data.table(df)
diff.padded <- function(x) c(x[1],diff(x))
dt[,Desidered_Col2:=diff.padded(MktValue),by=Deal]
Deal Trade_Date MktValue Desired_Col Desired_Col2
1: Deal1 31.08.2012 10 10 10
2: Deal2 31.08.2012 21 21 21
3: Deal1 03.09.2012 12 2 2
4: Deal2 03.09.2012 19 -2 -2
5: Deal3 03.09.2012 2 2 2
The solution works perfectly with data.table.
However given the size of my table I decided to try to work with an ffdf object. Hence I have now my data in a ffdf file and I am trying to reproduce the same solution unfortunately without success.
Do you have any advise how I can reproduce that in a ffdf?
Thanks for your help.
here is the full code I am running:
# Load needed packages
library(RODBC)
library(data.table)
library(ETLUtils)
library(RSQLite)
library(ffbase)
calendar <- read.csv("Trading_Calendar.csv",sep=";",stringsAsFactors=FALSE)
calendar$STICHTAG <- as.Date(calendar$STICHTAG,"%d.%m.%Y")
ST_a=Sys.Date()-2
rd_a=as.Date("13.11.2012","%d.%m.%Y")
ST=paste("'",as.character(format(ST_a,"%d.%m.%Y")),"'",sep="")
rd=paste("'",as.character(format(rd_a,"%d.%m.%Y")),"'",sep="")
gc(TRUE)
st.strom <- calendar[calendar$STICHTAG>=rd_a & calendar$STICHTAG<=ST_a & calendar$BR_Strom==1,"STICHTAG"]
st.strom <- format(st.strom,"%d.%m.%Y")
st.strom.s <- paste("('",do.call(paste, c(as.list(as.character(st.strom)), sep="','")),"')",sep="")
started.at=proc.time()
Sys.sleep(1)
memory.limit(size=4095)
query <- paste("select * from is_bewertung_data where commodity in ('CASH','COAL','CO2','ELEC','GCERT')
and stichtag in ",st.strom.s,sep="")
deals.strom <- read.odbc.ffdf(query = query,odbcConnect.args=list(dsn="dsn",uid="id",pwd="pwd"),
first.rows = 100000, next.rows = 500000, VERBOSE=TRUE)
result <- ffdfdply(deals.strom, deals.strom$DEALID, FUN=function(x){
x <- split(x, x$DEALID)
x <- lapply(x, FUN=function(onlyonedeal){
onlyonedeal$Desidered_Col2 <- c(NA, -diff(onlyonedeal$STICHTAG))
onlyonedeal
})
x <- do.call(rbind, x)
x
})
cat("Finished in",timetaken(started.at),"\n")
here the result of str(deals.strom[1:5,]):
'data.frame': 5 obs. of 39 variables:
$ ABBREVIATION : Factor w/ 33553 levels " C 251"," TÜV EE Donaustrom",..: 1893 1892 1894 1895 1896
$ TRADEDATE : POSIXct, format: "2007-06-19" "2007-06-19" "2007-06-19" ...
$ BOOK : Factor w/ 30 levels "CR_RIR_RISKRED",..: 10 10 10 10 10
$ CONTRACT : Factor w/ 20 levels "Base","DNULL",..: 1 5 5 1 1
$ BUYSELL : Factor w/ 2 levels "BUY","SELL": 2 1 2 1 1
$ RATE : num 54.2 57.2 57.3 54.2 55.1
$ AMOUNT : num 474792 501072 501773 474792 964476
$ CUR : Factor w/ 2 levels "EUR","USD": 1 1 1 1 1
$ VOLUME : num 8760 8760 8760 8760 17520
$ UNIT : Factor w/ 2 levels "MWH","t": 1 1 1 1 1
$ STARTDATE : POSIXct, format: "2010-01-01" "2010-01-01" "2010-01-01" ...
$ ENDDATE : POSIXct, format: "2011-01-01" "2011-01-01" "2011-01-01" ...
$ BROKERAGE : num 0 0 0 0 175
$ DV : num 85078 -98218 98919 -85078 -185048
$ REALIZED : num 85078 -98218 98919 -85078 -185048
$ PV : num 0 0 0 0 0
$ DV_DAY : num 0 0 0 0 0
$ DV_MONTH : num 0 0 0 0 0
$ DV_YEAR : num 0 0 0 0 0
$ TRADER : Factor w/ 16 levels "Adolf Plentz",..: 7 7 7 7 12
$ ACTIVE : Factor w/ 2 levels "LONGTERM","SHORTTERM": 2 2 2 2 2
$ STATUS : Factor w/ 2 levels "GCPTY","INT": 1 1 2 2 1
$ PV_MIN : num 0 0 0 0 0
$ PV_PLUS : num 0 0 0 0 0
$ VERTRAGSPARTY : Factor w/ 21 levels "EDL_G059","EDL_G097",..: 10 10 3 3 10
$ GESELLSCHAFT : Factor w/ 1 level "24/7 Trading": 1 1 1 1 1
$ COMMODITY : Factor w/ 5 levels "CASH","CO2","COAL",..: 4 4 4 4 4
$ TO_BE_DELIVERED: num 0 0 0 0 0
$ ACCOUNT : Factor w/ 8 levels "CR_RISKRED","HO_COAL",..: 5 5 5 5 5
$ VERW_PREIS : num 0 0 0 0 0
$ PV_ND : num 0 0 0 0 0
$ BILANZIERUNG : Factor w/ 2 levels "JA","NEIN": 1 1 1 1 1
$ MOTIV : Factor w/ 8 levels "Emissionszertifikate",..: 4 4 4 4 4
$ STICHTAG : POSIXct, format: "2012-11-13" "2012-11-13" "2012-11-13" ...
$ DEALID : Factor w/ 59704 levels "FUX.E.EEX.K.20090622.002",..: 7175 7103 12584 12500 17985
$ COUNTERPARTY : Factor w/ 174 levels "24sieben GmbH",..: 171 171 53 53 141
$ COMMODITY2 : Factor w/ 8 levels "CASH","CER","COAL",..: 4 4 4 4 4
$ MARKTGEBIET : Factor w/ 3 levels "Kohle","Strom",..: 2 2 2 2 2
$ INSTRUMENT : Factor w/ 88 levels "-","Elektrizität FUX EEX Base Apr11 EEXFUT",..: 1 1 1 1 1
my solution after Jan hint, not working:
test <- as.ffdf(deals.strom[,c("DEALID","STICHTAG","PV")])
test <- transform(test,chg=c(NA,diff(PV)),chg2=c(NA,-diff(PV)))
fdd <- as.ff(!duplicated(test$DEALID))
test[fdd,c("chg","chg2")] <- test[fdd,"PV"]
I get the following error msg:error: is.null(rownames(x)) is not TRUE. Somehow I cannot manage to subset the ffdf.