Pergunta

Currently I have a table like below:

U_ID SPOUSEDOB   FCHILDDOB   SCHILDDOB   ChangeDate
1    20/01/1980  01/01/1900  01/01/1900  01/01/2000
2    20/01/1950  20/01/1970  01/01/1900  01/01/2000
3    20/01/1960  20/01/1990  20/01/1995  01/01/2000
1    20/01/1980  20/01/1995  01/01/1900  01/01/2005
1    20/01/1980  20/01/1995  20/01/2006  01/01/2010

The date 01/01/1900 which means there is no spouse/child. I want to convert this table like below:

Member_ID  U_ID  Relation DOB         ChangeDate
1          1     Spouse   20/01/1980  01/01/2000
2          2     Spouse   20/01/1950  01/01/2000
3          2     Child    20/01/1970  01/01/2000
4          3     Spouse   20/01/1960  01/01/2000
5          3     Child    20/01/1990  01/01/2000
6          3     Child    20/01/1995  01/01/2000
7          1     Child    20/01/1995  01/01/2005
8          1     Child    20/01/2006  01/01/2010

But this table still could not provide the best way to answer this question at the specific time (01/01/2006) and (01/01/2011), user 1 had how many children? The answer would be 1 and 2. And I also find it difficult to convert from the table 1 to table 2, I'm stuck at how to create new row for the same user_id. Any idea on how to improve this situation or resolve the problem at converting table? Help is really appreciated. Thank you in advance.

Foi útil?

Solução

Here's a simple SAS datastep. You can adjust it to use VNAME() to define relation (depending on how your other variables are named); for example,

relation = vname(DOBs[_t]);

Then use SUBSTR or whatever to shorten it to the proper text. Other than that it should be sufficiently flexible to handle any number of relations in the initial HAVE dataset.

data want;
set have;
array DOBs SPOUSEDOB   FCHILDDOB   SCHILDDOB;
do _t = 1 to dim(DOBs);
  if DOBs[_t] ne '01JAN1900'd then do;
    relation=ifc(_t=1,'Spouse','Child'); *this could also be done using VNAME() to be more flexible;
    DOB=DOBs[_t];
    output;
  end;
end;
keep relation DOB ChangeDate U_ID;
format DOB Changedate Date9.;
run;

proc sort data=want;
by u_id descending relation dob changedate;
run;


data final;
set want;
by u_id descending relation dob changedate;
if first.dob;
run;

Then to process it to select only people born as of a certain date you can use the query fthiella posted if you prefer to use SQL, or you can filter in a SAS proc, like:

proc means data=final;
where dob le '01JAN2006'd;
class relation;
var (whatever);
run;

Or use ChangeDate if that is what you want to filter on rather than actual DOB.

Outras dicas

This will convert your table from the first format to the second:

SELECT
  U_ID,
  'Spouse' Relation,
  Spousedob DOB,
  MIN(STR_TO_DATE(ChangeDate, '%d/%m/%Y')) ChangeDate
FROM
  yourtable
WHERE
  Spousedob != '01/01/1900'
GROUP BY U_ID
UNION ALL
SELECT
  U_ID,
  'Child' Relation,
  FCHILDDOB DOB,
  MIN(STR_TO_DATE(ChangeDate, '%d/%m/%Y')) ChangeDate
FROM
  yourtable
WHERE FCHILDDOB != '01/01/1900'
GROUP BY U_ID
UNION ALL
SELECT
  U_ID,
  'Child' Relation,
  SCHILDDOB DOB,
  MIN(STR_TO_DATE(ChangeDate, '%d/%m/%Y')) ChangeDate
FROM yourtable
WHERE SCHILDDOB != '01/01/1900'
GROUP BY U_ID
ORDER BY ChangeDate, U_ID

but to answer your question, you could use this query:

SELECT (FCHILDDOB!='01/01/1900')+(SCHILDDOB!='01/01/1900')
FROM yourtable
WHERE
  (U_ID, ChangeDate) IN (
    SELECT U_ID, MAX(ChangeDate)
    FROM yourtable
    WHERE
      U_ID=1 AND MIN(STR_TO_DATE(ChangeDate, '%d/%m/%Y'))<'2011-01-01')

(I'm considering that your dates are stored as varchar, and I'm converting to date using STR_TO_DATE)

Edit

You could create a table yourtable2 with columns (Member_ID auto_increment, U_ID, Relation, DOB, ChangeDate) and then insert all of your data from yourtable to yourtable2 with this command:

INSERT INTO yourtable2 (U_ID, Relation, DOB, ChangeDate)
SELECT ... --- the select query above
ORDER BY ChangeDate, DOB, U_ID

Then to count the name of the children you could use:

SELECT COUNT(*)
FROM   yourtable2
WHERE  Relation='Child'
       AND U_ID=1
       AND ChangeDate <= '2011-01-01'

Please see fiddle here.

This does not work because I don't understand the relations in your starting table. But it might help you find another solution using plain old SAS data step code:

data have;
   input U_ID SPOUSEDOB :ddmmyy10. FCHILDDOB :ddmmyy10.  
         SCHILDDOB :ddmmyy10. ChangeDate :ddmmyy10.;
datalines;
1    20/01/1980  01/01/1900  01/01/1900  01/01/2000
2    20/01/1950  20/01/1970  01/01/1900  01/01/2000
3    20/01/1960  20/01/1990  20/01/1995  01/01/2000
1    20/01/1980  20/01/1995  01/01/1900  01/01/2005
1    20/01/1980  20/01/1995  20/01/2006  01/01/2010
run;
data want(keep=Member_ID U_ID Relation DOB ChangeDate);
   attrib Member_ID  length=8;
   attrib U_ID       length=8;
   attrib Relation   length=$6;
   attrib DOB        length=8 format=ddmmyy10.;
   attrib ChangeDate length=8 format=ddmmyy10.;
   retain Member_ID 0;

   set have;

   if _n_ = 1 or U_ID ne 1 then do;
      Member_ID + 1;
      Relation = 'Spouse';
      DOB = SPOUSEDOB;
      output;
      end;

   if FCHILDDOB ne mdy(1,1,1900) then do;
      Member_ID + 1;
      Relation = 'Child';
      DOB = FCHILDDOB;
      output;
      end;
   if SCHILDDOB ne mdy(1,1,1900) then do;
      Member_ID + 1;
      Relation = 'Child';
      DOB = SCHILDDOB;
      output;
      end;
  run;
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top