문제

I have the following series of commands:

my_data = read.csv(file='r-stats.out', sep='\t', skip=1)
histsub = subset(my_data, my_data[,10] != "Invalid")
hist(as.numeric(histsub[,10]))

r-stats.out is a file that has 10 columns, and column number 10 (one which I am trying to plot) has numbers ranging from -2000 to 10000 or the word "Invalid" which I try to first filter out. For some reason, my histogram only has range from 0 to 2500 IGNORING everything else. Why? What is happening? I did a

print(histsub)

and everything looks okay, those numbers are there in the histsub, but not on the plot. Please help.

EDIT: Adding a few lines from my_data print and also from histsub: my_data:

38    629345  1  633201  0   -41 Invalid    0   g    0     -37
39    633201  0  628727  0  4496     323    0   g    0    4629
40    628727  0  631371  1  7835     202    0   g    0 Invalid
41    631371  1  625871  1  7317     112    0   g    0    7379
42    625871  1  633427  1  1351     348    0   g    0    1321

histsub:

38    629345  1  633201  0  -41 Invalid    0   g    0   -37
39    633201  0  628727  0 4496     323    0   g    0  4629
41    631371  1  625871  1 7317     112    0   g    0  7379
42    625871  1  633427  1 1351     348    0   g    0  1321
도움이 되었습니까?

해결책

Try my_data[,10]=as.numeric(as.character(my_data[,10])) and then all the Invalid string entries will get converted to NA and won't show up in histograms anyway.

다른 팁

That implies its class is character, so it's probably implicitly converting to factor, and there are ~2500 uniques. Try using the argument stringsAsFactors = FALSE in read.csv

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top