SAS DATA: How to remove observations that only occur once

https://stackoverflow.com/questions/13215848

sas
datastep

30-07-2021
|

Pregunta

In SAS, suppose I have a dataset named "person_groups". It has two variables, named "person" and "group". This dataset simply assigns each person to a group.

How can I remove from this dataset all persons who have no one else in their group? In other words, how can I remove all singleton groups?

[I'd be happy with a proc sql solution or a data step solution--either is fine.]

Side note: I'm new to SAS. I have been using C++ and MATLAB for many years. I feel like I can't understand how to do anything with the SAS DATA step. It seems extremely clunky, bizarre, and inelegant. Frankly, I'm growing very frustrated. Anyone out there have hope for the weary? :)

Solución

Here's a way that uses a data step. This method requires a sort.

data person_groups;
 input person $ group $;
 datalines;
John Grp1
Mary Grp3
Joe Grp2
Jane Grp3
Frank Grp1
;

Proc Sort data=person_groups;
 by group;
run;

Data person_groups;
 set person_groups;
 by group;
 if first.group and last.group then delete;
run;

Otros consejos

Here is a PROC SQL solution:

proc sql;
   delete from person_groups
   where group in (
      select group
      from person_groups
      group by 1
      having count(*) = 1
      );
quit;

As you can see, PROC SQL mostly follows standard ANSI syntax, so your existing SQL skills should be quite portable. And hang in there with respect to SAS Data Step programming; the language is extremely rich, full featured, and by no means "inelegant". You may have seen inelegant code, but that's more likely the fault of the programmer. Take a few minutes and read the SAS Concepts manual.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow