Question

I have a dataset mydat with the following variables:

 MNES    IV
 0.84  0.40
 0.89  0.34
 0.91  0.31
 0.93  0.29
 0.95  0.26
 0.98  0.23
 0.99  0.22
 1.00  0.22
 1.02  0.20
 1.04  0.18
 1.07  0.18

And I need to fit cubic splines to these elements, where MNES is the object (X) and IV is the image (Y).

I have successfully accomplished what I need through PROC IML but I am afraid this is not the most efficient solution.

Specifically, my intended output dataset is:

 mnes    iv
 0.333  0.40
 0.332  0.40  <- for mnes out of sample MNES range, copy first IV;
 0.336  0.40
 ...    ...
 0.834  0.40
 0.837  0.40
 0.840  0.40
 0.842  INTERPOLATION
 0.845  INTERPOLATION
 0.848  INTERPOLATION
 ...
 1.066  INTERPOLATION
 1.069  INTERPOLATION 
 1.072  INTERPOLATION
 1.074  0.18
 1.077  0.18  <- for mnes out of sample MNES range, copy last IV;
 1.080  0.18
 ...    ...
 3.000  0.18

The necessary specifics are the following:

  • I always have 1001 points for MNES, ranging from 0.(3) to 3 (thus, each step is (3-1/3)/1000).
  • The interpolation for IV should only be used for the points between the minimum and maximum MNES.
  • For the points where MNES is greater than the maximum MNES in the sample, IV should be equal to the IV of the maximum MNES and likewise for the minimum MNES (it is always sorted by MNES).

My worry for efficiency is due to the fact that I have to solve this problem roughly 2 million times and right now it (the code below, using PROC IML) takes roughly 5 hours for 100k different input datasets.

My question is: What alternatives do I have if I wish to fit cubic splines given an input data set such as the one above and output it to a specific grid of objects? And what solution would be the most efficient?

  • With PROC IML I can do exactly this with the splinev function, but I am concerned that using PROC IML is not the most efficient way;
  • With PROC EXPAND, given that this is not a time series, it does not seem adequate. Additionally, I do not know how to specify the grid of objects which I need through PROC EXPAND;
  • With PROC TRANSREG, I do not understand how to input a dataset into the knots and I do not understand whether it will output a dataset with the corresponding interpolation;
  • With the MSPLINT function, it seems doable but I do not know how to input a data set to its arguments.

I have attached the code I am using below for this purpose and an explanation of what I am doing. Reading what is below is not necessary for answering the question but it could be useful for someone solving this sort of problem with PROC IML or wanting a better understanding of what I am saying.


I am replicating a methodology (Buss and Vilkov (2012)) which, among other things, applies cubic splines to these elements, where MNES is the object (X) and IVis the image (Y).

The following code is heavily based on the Model Free Implied Volatility (MFIV) MATLAB code by Vilkov for Buss and Vilkov (2012), available on his website.

The interpolation is a means to calculate a figure for stock return volatility under the risk-neutral measure, by computing OTM put and call prices. I am using this for the purpose of my master thesis. Additionally, since my version of PROC IML does not have functions for Black-Scholes option pricing, I defined my own.

proc iml;
    * Define BlackScholes call and put function;
    * Built-in not available in SAS/IML 9.3;
    * Reference http://www.lexjansen.com/wuss/1999/WUSS99039.pdf ;

    start blackcall(x,t,s,r,v,d);
        d1 = (log(s/x) + ((r-d) + 0.5#(v##2)) # t) / (v # sqrt(t));
        d2 = d1 - v # sqrt(t);
        bcall = s # exp(-d*t) # probnorm(d1) - x # exp(-r*t) # probnorm(d2);
        return (bcall);
    finish blackcall;

    start blackput(x,t,s,r,v,d);
        d1 = (log(s/x) + ((r-d) + 0.5#(v##2)) # t) / (v # sqrt(t));
        d2 = d1 - v # sqrt(t);
        bput = -s # exp(-d*t) # probnorm(-d1) + x # exp(-r*t) # probnorm(-d2);
        return (bput);
    finish blackput;

    store module=(blackcall blackput);
quit;

proc iml;
    * Specify necessary input parameters;
    currdate = "&currdate"d;
    currpermno = &currpermno;
    currsecid = &currsecid;
    rate = &currrate / 100;
    mat = &currdays / 365;
    * Use inputed dataset and convert to matrix;
    use optday;
    read all var{mnes impl_volatility};
    mydata = mnes || impl_volatility;

    * Load BlackScholes call and Put function;
    load module=(blackcall blackput);

    * Define parameters;
    k = 2;
    m = 500;

    * Define auxiliary variables according to Buss and Vilkov;
    u = (1+k)##(1/m);
    a = 2 * (u-1);

    * Define moneyness (ki) and implied volatility (vi) grids;
    mi = (-m:m);
    mi = mi`;
    ki = u##mi;

    * Preallocation of vi with 2*m+1 ones (1001 in the base case);
    vi = J(2*m+1,1,1);

    * Define IV below minimum MNESS equal to the IV of the minimum MNESS;
    h = loc(ki<=mydata[1,1]);
    vi[h,1] = mydata[1,2];

    * Define IV above maximum MNESS equal to the IV of the maximum MNESS;
    h = loc(ki>=mydata[nrow(mydata),1]);
    vi[h,1] = mydata[nrow(mydata),2];

    * Define MNES grid where there are IV from data;
    * (equal to where ki still has ones resulting from the preallocation);
    grid = ki[loc(vi=1),];

    * Call splinec to interpolate based on available data and obtain coefficients;
    * Use coefficients to create spline on grid and save on smoothFit;
    * Save smoothFit in correct vi elements;
    call splinec(fitted,coeff,endSlopes,mydata);
    smoothFit = splinev(coeff,grid);
    vi[loc(vi=1),1] = smoothFit[,2];

    * Define elements of mi corresponding to OTM calls (MNES >=1) and OTM puts (MNES <1); 
    ic = mi[loc(ki>=1)];
    ip = mi[loc(ki<1)];

    * Calculate call and put prices based on call and put module;
    calls = blackcall(ki[loc(ki>=1),1],mat,1,rate,vi[loc(ki>=1),1],0);
    puts = blackput(ki[loc(ki<1),1],mat,1,rate,vi[loc(ki<1),1],0);

    * Complete volatility calculation based on Buss and Vilkov;
    b1 = sum((1-(log(1+k)/m)#ic)#calls/u##ic);
    b2 = sum((1-(log(1+k)/m)#ip)#puts/u##ip);
    stddev = sqrt(a*(b1+b2)/mat);

    * Append to voldata dataset;
    edit voldata;
    append var{currdate currsecid currpermno mat stddev};
    close voldata;
quit;
Was it helpful?

Solution

Ok. I'm going to do this for 2 data sets to help you with the fact you have a bunch. You will have to modify for your inputs, but this should give you better performance.

  1. Create some inputs
  2. Get the first and last values from each input data set.
  3. Create a list of all MNES values.
  4. Merge each input to the MNES list and set the upper and lower values.
  5. Append the Inputs together
  6. Run PROC EXPAND with a BY statement to single pass all the input values and create the splines.

The trick is to "trick" EXPAND into thinking MNES is a Daily timeseries. I do this by making it an integer -- date values are integers behind the scenes in SAS. With no gaps, ETS Procedures will assume a "daily" frequency.

After this is done, run a Data Step to call the Black-Scholes (BLKSHPTPRC, BLKSHCLPRC) functions and complete your analysis.

/*Sample Data*/
data input1;
input MNES    IV;
/*Make MNES and integer*/
MNES = MNES * 1000;
datalines;
 0.84  0.40
 0.89  0.34
 0.91  0.31
 0.93  0.29
 0.95  0.26
 0.98  0.23
 0.99  0.22
 1.00  0.22
 1.02  0.20
 1.04  0.18
 1.07  0.18
 ;
run;

data input2;
input MNES    IV;
MNES = MNES * 1000;
datalines;
 0.80  0.40
 0.9  0.34
 0.91  0.31
 0.93  0.29
 0.95  0.26
 0.98  0.23
 1.02  0.19
 1.04  0.18
 1.07  0.16
 ;
run;

/*Get the first and last values from the input data*/
data _null_;
set input1 end=last;
if _n_ = 1 then do;
    call symput("first1",mnes);
    call symput("first1_v",iv);
end;
if last then do;
    call symput("last1",mnes);
    call symput("last1_v",iv);
end;
run;

data _null_;
set input2 end=last;
if _n_ = 1 then do;
    call symput("first2",mnes);
    call symput("first2_v",iv);
end;
if last then do;
    call symput("last2",mnes);
    call symput("last2_v",iv);
end;
run;

/*A list of the MNES values*/
data points;
do mnes=333 to 3000;
    output;
end;
run;

/*Join Inputs to the values and set the lower and upper values*/
data input1;
merge points input1;
by mnes;
if mnes < &first1 then
    iv = &first1_v;
if mnes > &last1 then
    iv = &last1_v;

run;
data input2;
merge points input2;
by mnes;
if mnes < &first2 then
    iv = &first2_v;
if mnes > &last2 then
    iv = &last2_v;

run;

/*Append the data sets together, keep a value 
  so you can tell them apart*/
data toSpline;
set input1(in=ds1)
    input2(in=ds2);
if ds1 then
    Set=1;
else if ds2 then
    Set=2;
run;

/*PROC Expand for the spline.  The integer values
  for MNES makes it think these are "daily" data*/
proc expand data=toSpline out=outSpline method=spline;
by set;
id mnes;
run;

OTHER TIPS

Here is the solution I came up with. Sadly, I cannot yet conclude whether this is more efficient than the PROC IML solution - just for one dataset they both take the pretty much the same running time.

MSPLINT: 
real time: 1.42 seconds
cpu time 0.23 seconds

PROC IML: 
real time: 1.02 seconds
cpu time: 0.26 seconds

The biggest disadvantage of this solution when compared to the one above by @DomPazz is that I cannot process the data by 'By groups', which would certainly make it a lot faster... I am still thinking whether I can solve this without resorting to a macro loop but I am all out of ideas.

I keep the solution of defining a macro variable with the first and last values, as proposed by @DomPazz, but I then use a datastep, which copies the first and last values or applies the interpolation depending on what value of MNES it is stepping through. It applies the interpolation through the MSPLINT function. Its syntax is as follows:

MSPLINT(X, n, X1 <, X2, ..., Xn>, Y1 <,Y2, ..., Yn> <, D1, Dn>)

Where X is the object at which you wish to evaluate the spline, n is the number of knots supplied to the function (i.e. the number of observations in the input data), X1,...,Xn are the objects in the input data (i.e. MNES) and Y1,...,Yn are the images in the input data (i.e. IV). D1 and Dn (optional) are the derivatives you wish to maintain for interpolation objects X < X1 and X>Xn.

An interesting note is that by specifying D1 and Dn. as 0 you can have the points beyond the grid equal to the last observation inside the interpolated area. However, this forces the spline images to converge to a derivative of zero, potentially generating a non-natural pattern in the data. I opted not to define these as zero and defining the points outside the interpolation area separately.

So, I use PROC SQL to define the lists of elements of MNES and IV in macro variables, divided by commas, so that I can input them in the MSPLINT function. I also define the number of observations through PROC SQL.

MNES, as I commented in the answer above, was not well defined in my explanation. It needs to be defined as the variable u to the power of elements from -500 to 500. This is just a detail but it will allow you to understand where MNES comes from in the example below.

So, here is the solution, including example data.

* Set model inputs;
%let m = 500;
%let k = 2;
%let u = (1+&k) ** (1/&m);

/*Sample Data*/    
data input1;
    input MNES 13.10 IV 8.6;
    cards;
0.8444984010 0.400535
0.8901469633 0.347988
0.9129712444 0.318596
0.9357955255 0.291456
0.9586198066 0.264852
0.9814440877 0.236231
0.9928562283 0.224858
1.0042683688 0.220035
1.0270926499 0.201118
1.0499169310 0.189373
1.0727412121 0.185628
    ;
run;

data _null_;
    set input1 end=last;
    if _n_ = 1 then do;
        call symput("first1",MNES);
        call symput("first1_v",IV);
    end;
    if last then do;
        call symput("last1",MNES);
        call symput("last1_v",IV);
    end;
run;

proc sql noprint;
    select MNES into:mneslist
    separated by ','
    from input1;
    select IV into:IVlist
    separated by ','
    from input1;
    select count(*) into:countlist
    from input1;
quit;

data splined;
    do grid=-500 to 500;
        mnes = (&u) ** grid;
        if mnes < &first1 then IV = &first1_v;
        if mnes > &last1 then IV = &last1_v;
        if mnes >= &first1 and mnes <= &last1 then IV = msplint(mnes, &countlist, &mneslist, &IVlist);
    end;
run;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top