Matching variable names with their corresponding values over different databases

Question 1

Maybe this example helps:

clear all
set more off

/*
load two example MS Excel files with var names only and accumulate var names in a local.
files are named varfile.xls and varfile2.xls
*/

foreach i in "" "2" {

    import excel "/home/roberto/Desktop/stata_tests/varfile`i'.xls", firstrow clear

    * get var names
    quietly ds

    * save var names in local
    local myvars `myvars' `r(varlist)'
}

* load database that contains vars and values
sysuse auto, clear

* do pca
pca `myvars'

/*
varfile.xls contains variables "weight" and "price"
varfile2.xls contains variables "mpg" and "length"
*/

ds does the trick here because it saves the names of the variables picked up in the MS Excel sheet and stores the result in r(varlist). See help ds and help saved results (or help stored results). Afterwards, we load a "complete" database and use the stored variable names with pca.

The MS Excel files look like this:

enter image description here

This, I think, answers the specific question you pose.

Edit

Looking closer at your code, I'm not sure the problem is related to matching variable names in the complete database, but rather some problem with the way you set up preserve and restore. Instead of using that set of commands, try simply loading the complete database when you need it (with use).

What do you have before the preserve? Where does your error appear? Please post more code. A reproducible example would help.

Edit 2

My conjecture now is that you have nothing before the preserve, so when you restore, you're just setting the slate clean; you are restoring a blank database. Therefore, trying pca <somevar> gives you:

no variables defined
r(111);

preserve preserves the data as it is just before the command is issued.

Question 2

Personal comment: There is too much code here for me to want to try and absorb what you are trying to do. I comment only on some details of technique.

This block of code

gen January = 1 if Month == 1 gen February = 1 if Month == 2 gen March = 1 if Month == 3 gen April = 1 if Month == 4 gen May = 1 if Month == 5 gen June = 1 if Month == 6 gen July = 1 if Month == 7 gen August = 1 if Month == 8 gen September = 1 if Month == 9 gen October = 1 if Month == 10 gen November = 1 if Month == 11 gen December = 1 if Month == 12 replace January = 0 if January == . replace February = 0 if February == . replace March = 0 if March == . replace April = 0 if April == . replace May = 0 if May == . replace June = 0 if June == . replace July = 0 if July == . replace August = 0 if August == . replace September = 0 if September == . replace October = 0 if October == . replace November = 0 if November == . replace December = 0 if December == .

can be rewritten like this

tokenize "`c(Months)'"
forval j = 1/12 { 
    gen ``j'' = Month == `j' 
}

The month names January to December are wired into c(Months).

sum `var', meanonly
local mu =r(mean)
reg `var' January  February March April May June July August September October November December, nocons
predict double `var'SA, residual
replace `var'SA=`var'SA+`mu'
egen sd = sd(`var'SA)
replace `var'SA=`var'SA/sd
drop sd

can be shortened to

reg `var' January-December, nocons
predict double `var'SA, residual
sum `var' 
replace `var'SA = (`var'SA + r(mean)) / r(sd)

Note that it is not a good idea to create an entire variable holding just the SD. That cancels out any time savings from using summarize, meanonly.

I don't comment here on what you are trying to do statistically, adding the mean and then dividing by the SD.

Question 3

@Roberto Ferrer is addressing your main problem, which hinges on comparing variable names across files. I add a detail on the use of local macros and wildcard syntax.

local x ""
foreach var of varlist *SA {
    local x `x' `var'
}

is a long way to get

unab x : *SA