You can use the first.
& last.
automatic variables created by SAS when using by-group
processing. They give more control on which row you consider as duplicate.
Please read the manual to understand by group processing in a Data Step
data uscpi_dedupedByYear;
set uscpi_sorted;
by year;
if first.year; /*only keep the first occurence of each distinct year. */
/*if last.year; */ /*only keep the last occurence of each distinct year*/
run;
A lot depends on who your input dataset is sorted. For ex: If your input dataset is sorted by year & month and you use if first.year;
then you can see that it only keeps the earliest month in any given year. However, if your dataset is sorted by year & descending month
then if first.year;
retains last month in any given year.
This behaviour obviously differs from how nodupkey
works.