Bootstrapping Stepwise Regression in Stata

Question

By using _b as a short-cut, the first iteration defined which coefficients were to be stored by simulate in all subsequent iterations. That is fine for most simulation programs, as those would use a fixed set of coefficients, but not what you want to use in combination with sw. So I adapted the program to explicitly list the coefficients (possibly missing when not selected) that are to be stored.

I also changed your programs such that they will run faster by avoiding mkmat and svmat and replacing those computations with predict and generate. I also changed it to make it fit more with conventions in the Stata community that a command will only replace a dataset in memory after a user explicitly asks for it by specifying the clear option. Finally I made sure that names of variables and scalars created in the program do not conflict with names already present in memory by using tempvar and tempname. These will also be automatically deleted when the program ends.

clear all
program define sw_pbs, rclass
    syntax varlist, clear [reps(integer 100)]

    gettoken depvar indepvar : varlist
    foreach var of local indepvar {
        local res "`res' `var'=r(`var')"
    }

    simulate `res', reps(`reps') : sw_pbs_simulator `varlist'
end

program define sw_pbs_simulator, rclass
    syntax varlist
    tempname rmse b
    tempvar yhat y
    gettoken depvar indepvar : varlist
    reg `depvar' `indepvar'
    scalar `rmse' = e(rmse)
    predict double `yhat' if e(sample) 
    gen double `y' = `yhat' +  rnormal(0, `rmse') 
    sw reg `y' `indepvar', pr(0.10) pe(0.05)

    // start returning coefficients
    matrix `b' = e(b)
    local in : colnames `b'
    local out : list indepvar - in
    foreach var of local in {
        return scalar `var' = _b[`var']
    }
    foreach var of local out {
        return scalar `var' = .
    }
end