Question

I have a very large table that contains an ID field and a datetime field. The table is ordered by the ID field, and INDEXED on the datetime field.

I want to quickly find the maximum datetime value but I can't find any good way to do this.

Sample data:

data x;
  do id=1 to 10000000;
    created_datetime = datetime() + (ranuni(1)*100000);
    output;
  end;
  format created_datetime datetime22.;
run;

proc sql noprint;
  create index created_datetime on x;
quit;

Attempt #1: PROC SQL and the max() function

For some reason I thought that this would instantly return the result but I found what actually happens was counter-intuitive (to me at least). Using the max() function doesn't use an index - it can't! Where clauses etc can make use of indexes but the max() function can't. Even if you force the use of an index it's still going to process all rows in the table, just in the order that using the index returns them.

option msglevel=i;
proc sql noprint;
  select max(created_datetime) from x(idxname=x);
quit;

Attempt #2: By-group processing

The below easily returns the first row using an index:

data min; 
  set x;
  by created_datetime;
  output;
  stop;
run;

But I can't use the descending keyword to work backwards through the list to get the last row:

data min; 
  set x;
  by descending created_datetime;
  output;
  stop;
run;

SAS also doesn't seem to support descending indexes, so I can't use that approach either.

Attempt #3: Use metadata about the index and a WHERE statement

I looked in SASHELP.VINDEX hoping that maybe the max values might be stored in the metadata somewhere that I could then use in a where statement. No luck there.

EDIT :

Attempt #4: PROC SQL with inobs or outobs

@DomPazz's below answer inspired me to revisit some other SQL based solutions. I thought perhaps the order by statement in PROC SQL may interact with the inobs or outobs options to achieve my goal. It didn't work though. The ordering looks like it is applied to the output of the query and in no way affects the order in which the rows are actually read in.

/* Uncomment options as necessary */
proc sql noprint /*inobs=1 outobs=1*/;
  create table temp as 
  select created_datetime
  from x
  order by created_datetime desc;
quit;

Help!

Was it helpful?

Solution

This gives you the centiles - the last one should be the 100% mark. This requires the index to have been recreated using the UPDATECENTILES option after any additions/deletions to the data.

proc contents data=have centiles;
run;

You can grab that with ODS OUTPUT if you want it as a dataset (the output table name is "INDEXES"):

ods output indexes=temp;
proc contents data=have centiles ;
run;

See Michael Raithel's papers, in particular The Basics Of Using SAS Indexes, for more details.

OTHER TIPS

What I could do - since it is a datetime value, is create a new field containing the same value multiplied by -1, and then index the new field. Ugly but it would work.

Pros:-

  • Solves the issue.
  • Simple - easily explained with some comments

Cons:-

  • Wastes space with an additional field and index.
  • Additional processing overhead associated with maintaining the index.
  • Only works with numeric fields guaranteed to be >= 0
  • It's an ugly hack
  • probably lots of others...

Unless someone can think of a nicer approach I'll probably end up going with this.

Distinct() uses the index. On your example here, this is slower than your other methods, but for a really big table it might be faster.

proc sql noprint;
create table temp as 
    select distinct(created_datetime) as max
    from x
    order by max desc;
quit;

18003  proc sql noprint;
18004  create table temp as
18005      select distinct(created_datetime) as max
18006      from x
18007      order by max desc;
INFO: Index created_datetime of SQL table WORK.X selected for SQL SELECT DISTINCT/UNIQUE
      optimization.
NOTE: SAS threaded sort was used.
NOTE: Table WORK.TEMP created, with 9999865 rows and 1 columns.

18008  quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           2.97 seconds
      cpu time            4.54 seconds
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top