I must be misunderstanding how read.csv works in R. I have read the help file, but still do not understand how a csv file containing:

40900,-,-,-,241.75,0
40905,244,245.79,241.25,244,22114
40906,244,246.79,243.6,245.5,18024
40907,246,248.5,246,247,60859

read into R using: euk<-data.matrix(read.csv("path\to\csv.csv"))

produces this as a result (using tail):

         Date Open High Low  Close Volume
[2713,] 15329  490  404 369 240.75  62763
[2714,] 15330  495  409 378 242.50 127534
[2715,] 15331    1    1   1 241.75      0
[2716,] 15336  504  425 385 244.00  22114
[2717,] 15337  504  432 396 245.50  18024
[2718,] 15338  512  442 405 247.00  60859

It must be something obvious that I do not understand. Please be kind in your responses, I am trying to learn.

Thanks!

有帮助吗?

解决方案

The issue is not with read.csv, but with data.matrix. read.csv imports any column with characters in it as a factor. The '-' in the first row for your dataset are character, so the column is converted to a factor. Now, you pass the result of the read.csv into data.matrix, and as the help states, it replaces the levels of the factor with it's internal codes.

Basically, you need to insure that the columns of your data are numeric before you pass the data.frame into data.matrix.

This should work in your case (assuming the only characters are '-'):

euk <- data.matrix(read.csv("path/to/csv.csv", na.strings = "-", colClasses = 'numeric'))

其他提示

I'm no R expert, but you may consider using scan() instead, eg:

> data = scan("foo.csv", what = list(x = numeric(), y = numeric()), sep = ",")

Where foo.csv has two columns, x and y, and is comma delimited. I hope that helps.

I took a cut/paste of your data, put it in a file and I get this using 'R'

> c<-data.matrix(read.csv("c:/DOCUME~1/Philip/LOCALS~1/Temp/x.csv",header=F))
> c
        V1 V2 V3 V4     V5    V6
[1,] 40900  1  1  1 241.75     0
[2,] 40905  2  2  2 244.00 22114
[3,] 40906  2  3  3 245.50 18024
[4,] 40907  3  4  4 247.00 60859
> 

There must be more in your data file, for one thing, data for the header line. And the output you show seems to start with row 2713. I would check:

The format of the header line, or get rid of it and add it manually later.
That each row has exactly 6 values.
The the filename uses forward slashes and has no embedded spaces 
(use the 8.3 representation as shown in my filename).

Also, if you generated your csv file from MS Excel, the internal representation for a date is a number.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top