First of all, there seems a problem with the file encoding. The downloaded file has obviously a Latin-encoding which is not correctly recognizes, why it says L�cke
and not Lücke
:
encoding = "latin1"
Secondly, Your example seems to be not reproducible: From my understanding you want to skip 28 lines (maybe I am wrong). And the variable strs
is not declared in your example. From what I understood you want to skip 28 lines and leave the last one out so in total
nrows = length( readLines( file ) ) - 29
Finally you bumped into this common R issue: How to convert a factor to an integer\numeric without a loss of information?. The entire column is interpreted as character
vector because not all elements could be interpreted as numeric
. And when adding a character
vector to a data.frame it is per default casted to a factor
column. Although it is not necessary, if you specify the correct range of lines, you can avoid this with
stringsAsFactors = FALSE
So in total:
f <- readLines("Q-Tagesmittel-204586.csv")
df <- read.csv2(
text = f,
header = FALSE,
sep = ";",
quote="\"",
dec=",",
skip=28,
col.names=c("Datum", "Abfluss"),
nrows = length(f) -29,
encoding = "latin1",
stringsAsFactors = FALSE
)
Oh, and just in case you want to convert as next step the Datum
column to a date object, one method to achieve this would be
df$Datum <- strptime( df$Datum, "%d.%m.%Y %H:%M:%S" )
str(df)
'data.frame': 12784 obs. of 2 variables:
$ Datum : POSIXlt, format: "1976-01-01" "1976-01-02" "1976-01-03" "1976-01-04" ...
$ Abfluss: num 0.691 0.799 0.814 0.813 0.795 0.823 0.828 0.831 0.815 0.829 ...