Pergunta

I have remove outliers in the modeling data. I am tired trying all methods for removing as there is an outlier that i troubling me a lot after applying many methods .

can anyone pleas help me on this..... please..

I hv used winzorise,outliers,extremevalues packeges etc, yet could nt remove outliers

The data has 50000 cutomers and 32 attributes.

The data has both numeric and non numeric data

I am not able to attach the data set here.

please help me

Extra information:

I am more than worried since its my dissertation i have no idea how to deal with outliers..

If u know anything that works please post...

Data is available on net, i can not post it here , sorry....

and my supervisor need a plot with no outlier.. and also the entire data slot present for the outliers data. I don't know how to do it for all the combinations of variables and pick outliers and plot without any outliers in the graph.

I have no idea how to do it. I cant post pictures or snap shots of data since reputation is <10

Foi útil?

Solução

Without more information about your data and your results so far, you will only get very general answers. For instance, there is a chapter on outlier detection in Y. Zhao's R and Data Mining that may be useful.

If your dataset is this one, most of the variables are qualitative: it may be sufficient to look at each variable separately, and consider rare classes as outliers. A few more algorithms are listed in this article.

It could also be that there are no outliers to worry about.

Outras dicas

Your data is multivariate so you can use cov.mcd and cov.mve for minimum covariance determinant and minimum volume ellipsoid estimators. Then calculate mahalonabis distances using one of these covariance estimates. Squared mahalonobis distances which are above a critical value can be considered big and corresponding observations can be labeled as outliers. Use quantile of chisquare distribution with degree of freedom of p where p is the number of variables.

Edit: cov.mcd and cov.mve are defined in package MASS

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top