Вопрос

I am new to SAS and have this basic problem. I have a list of NYSE trading dates in table A as follows -

trading_date
1st March 2012
2nd March 2012
3rd March 2012
4th March 2012
5th March 2012
6th March 2012

I have another table B that has share price information as -

Date          ID    Ret Price
1st March 2012  1   …   …
3rd March 2012  1   …   …
4th March 2012  1   …   …
5th March 2012  1   …   …
6th March 2012  1   …   …
1st March 2012  2   …   …
3rd March 2012  2   …   …
4th March 2012  2   …   …

... has numeric data related to price and returns.

Now I need to join the NYSE Data table to the above table to get the following table -

Date         ID    Ret  Price
1st March 2012  1   …   …
2nd March 2012  1   0   0
3rd March 2012  1   …   …
4th March 2012  1   …   …
5th March 2012  1   …   …
6th March 2012  1   …   …
1st March 2012  2   …   …
2nd March 2012  2   0   0
3rd March 2012  2   …   …
4th March 2012  2   …   …

i.e. a simple left join. The zero's will be filled with . in SAS to indicate missing values, but you get the idea. But if I use the following command -

proc sql;
create table joined as
select table_a.trading_date, table_b.* from table_a LEFT OUTER join table_b on table_a.trading_date=table_b.date;
quit;

The join happens only for the first ID (i.e. ID=1) while for the rest of the IDs, the same data is maintained. But I need to insert the trade dates for all IDs.

How can get the final data without running a do while loop for all IDs? I have 1000 IDs and looping and joining 1000 times is not an option due to limited memory.

Это было полезно?

Решение

Joe is right, you need to take also ID into consideration, but with his solution you cannot get 2nd March 2012 because no one is trading that day. You can do everything with just one sql step (which will take a bit longer):

proc sql;
   create table final as
   select d.trading_date, d.ID, t.Price, t.Ret
   from
   (
      select trading_date, ID 
      from table_a, (select distinct ID from table_b) 
   ) d
   left join
   (
      select *
      from table_b
   ) t
   on t.Date=d.trading_date and t.ID=d.ID
   order by d.id, d.trading_date;
quit;

Другие советы

Your left join doesn't work since it doesn't take ID into account. SAS (or rather SQL) doesn't know that it should repeat by ID.

The easiest way to get the full combination is PROC FREQ with SPARSE, assuming someone has a trade on every valid trading day.

proc freq data=table_b noprint;
tables id*trading_date/sparse out=table_all(keep=id trading_date);
run;

Then join that to the original table_b by id and date.

Alternately, you can use PROC MEANS, which can get your numerics (it can't get characters this way, unless you can use them as a class value).

Using table_b as created by Anton (With ret and price variables):

proc means data=table_b noprint completetypes nway;
class id trading_date;
var ret price;
output out=table_allmeans sum=;
run;

This will output missing for missing rows and values for present rows, and will have a _FREQ_ variable that allows you to differentiate whether a row is really present in the trading dataset or not.

I suppose there must be something off with the data because your query looks fine and worked on the testing data I generated along the lines you described:

data table_a;
    format trading_date date9.;
    do trading_date= "01MAR2012"d to "06MAR2012"d;
        output;
    end;
run;

data table_b;
    format date date9.;
    ret = 0;
    price = 0;
    do date= "01MAR2012"d to "06MAR2012"d;
        do ID = 1 to 4;
            if ranuni(123) < 0.3 then
                output;
        end;
    end;
run;

Below is what I get after running your query copied verbatim:

trading_date date ret price ID 
01MAR2012 01MAR2012 0 0 3 
02MAR2012 02MAR2012 0 0 2 
03MAR2012 03MAR2012 0 0 1 
03MAR2012 03MAR2012 0 0 2 
04MAR2012 04MAR2012 0 0 2 
05MAR2012 05MAR2012 0 0 3 
06MAR2012 . . . . 

It is worth checking the format of your dates- are they numeric? If they are character, are they formatted the same way? If they are numeric, are they dates or datetimes with some odd format applied?

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top