Question

Newbie PL/SQL question:

In order to calculate predicted values for a multivariate linear regression analysis, I'd like to multiply each of the regression parameters in Table A by the corresponding variable value for all of the records in Table B, then sum the products for each record in Table B.

Table A contains a single row of parameter values (numerical constants) with n columns, one for each parameter, while Table B contains 100,000+ records which include n columns for each regression variable.

Is there an efficient way to perform these calculations? The simplest approach would be to join the columns in Table A to Table B, which would result in a joined table with n columns containing duplicate parameter values for all 100,000+ records. However, this seems wasteful of processing time and memory.

Or is there a way to declare global constants from the parameter values in Table A (like macro variables in SAS) and then perform the calculations in Table B using the global constant values?

Any help is much appreciated!

Thanks, Robert

Was it helpful?

Solution

In SQL, one way to do this is with a join and aggregation:

select t.id,
       max(t.A)*max(case when p.col = 'A' then p.coefficient end),
       max(t.B)*max(case when p.col = 'B' then p.coefficient end),
       . . .
from data t cross join
     parameters p
group by t.id

You can also do it with an inline query in the select statement:

select t.A*(select max(coefficient) from parameters where col = 'A'),
       . . .
from data t

Assuming that you don't have too much data (that you have thousands, not millions of rows), either approach should perform reasonably.

By the way, if the parameters were stored in a single row, then a simple join and multiplication would suffice.

There is another approach, similar to the first, but perhaps clearer:

select t.id,
       t.A*p.Acoefficient,
       t.B*p.Bcoefficient,
       . . .
from data t cross join
     (select max(case when p.col = 'A' then coefficient end) as Acoefficient,
             max(case when p.col = 'B' then coefficient end) as Bcoefficient,
             ...
      parameters p
     ) p

I'm adding this, because this is probably how I would really code the solution.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top