Question

I'm using a binary classification with SVM and MLP for financial data. My input data has 21 features so I used dimensionally reduction methods for reducing the dimension of data. Some dimensionally reduction methods like stepwise regression report best features so I will used these features for my classification mode and another methods like PCA transform data to a new space and I use for instance 60% of best reported columns (features). The critical problem is in the phase of using final model. For example I used the financial data of past year and two years ago for today financial position. So now I want use past and today data to prediction next year. My question is here: Should I use PCA for new input data before inserting to my designed classification model? How can I use (For example Principal component analysis) for this data? I must use it like before? (pca(newdata…)) or there is some results from last PCA that I must use in this phase?

more information :

This is my system structure: I have a hybrid classification method with optimization algorithm for select best features (inputs) of my model and best parameters of my classification method so for a classification method like MLP I takes long time to optimization with 21 features (beside of this I repeat every iteration of my optimization algorithm 12 times / cross section ) . So I want decrease the features with dimensionally reduction techniques (like PCA, NLPCA or supervised methods like LDA/FDA) before insert it to classification method. For example I’m using this structure of PCA code:

[coeff,score,latent,tsquared,explained,mu] = pca(_)

After that I will use 10 first columns of output (that sorted by PCA function) for input of my classification and optimization model. In final phase I will find the best model parameters with the best combination of inputs. For example my raw data has 21 features. After first phase of using PCA I will choose 10 features and in final model after optimization of my classification model. I will have a model with 5 best chosen features. Now I want use this model with new data. What must I do?

Thank you so much for your kind helps.

Was it helpful?

Solution

You should follow the following steps:

  1. With your training data, create a PCA model
  2. With the PCA of your training data, train your classifier
  3. Apply the first PCA model to your new data
  4. With the PCA of your new data, test the classifier

Here are some code snippets for steps 1 and 3 (2 and 4 depend on your classifier):

%Step 1.Generate a PCA data model  

[W, Y] = pca(data, 'VariableWeights', 'variance', 'Centered', true);
%# Getting the correct W, mean and weights of data (for future data)
W = diag(std(data))\W;
[~, mu, we] = zscore(data);
we(we==0) = 1;


%Step 3.Apply the previous data model to a new vector

%# New coordinates as principal components
x = newDataVector; 
x = bsxfun(@minus,x, mu);
x = bsxfun(@rdivide, x, we);
newDataVector_PCA = x*W;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top