Using cbind to create a columnar .csv file but 'X' always appears in row one

https://stackoverflow.com/questions/23109526

04-07-2023
|

Pregunta

I have a script that is working perfectly except that in my R cbind operation, adjacent to the numerical value that I require in the first row, is an 'X'.

Here is my script:

library(ncdf)
library(Kendall)
library(forecast)
library(zoo)
setwd("/home/cohara/RainfallData")

files=list.files(pattern="*.nc")

j=81
for (i in seq(1,9))
{
        file<-open.ncdf(sprintf("/home/cohara/RainfallData/%s.nc",i))
        year<-get.var.ncdf(file,"time")
        data<-get.var.ncdf(file,"var61")
        fit<-lm(data~year)              #least sqaures regression
        mean=rollmean(data,4,fill=NA)
        kendall<-Kendall(data,year)
        write.table(kendall[[2]],file="/home/cohara/RainfallAnalysis/Kendall_p-value_for_10%_increase_over_81_-_89_years.csv",append=TRUE,quote=FALSE,row.names=FALSE,col.names=FALSE)
        write.table(kendall[[1]],file="/home/cohara/RainfallAnalysis/Kendall_tau_for_10%_increase_over_81_-_89_years.csv",append=TRUE,quote=FALSE,row.names=FALSE,col.names=FALSE)
        png(sprintf("./10 percent increase over %s years.png",j))
        par(family="serif",mar=c(4,6,4,1),oma=c(1,1,1,1))
        plot(year,data,pch="*",col=4,ylab="Precipitation (mm)",main=(sprintf("10 percent increase over %s years",j)),cex.lab=1.5,cex.main=2,ylim=c(800,1400),abline(fit,col="red",lty=1.5))
        par(new=T)
        plot(year,mean,type="l",xlab="year",ylab="Precipitation (mm)",cex.lab=1.5,ylim=c(800,1400),lty=1.5)
        legend("bottomright",legend=c("Kendall tau = ",kendall[[1]]))
        legend("bottomleft",legend=c("Kendall 2-tailed p-value = ",kendall[[2]]))
        legend(x="topright",c("4 year moving average","Simple linear trend"),lty=1.5,col=c("black","red"),cex=1.2)
        legend("topleft",c("Annual total"),pch="*",col="blue",cex=1.2)
        dev.off()
        j=j+1
}
tmp<-read.csv("/home/cohara/RainfallAnalysis/Kendall_p-value_for_10%_increase_over_81_to_89_years.csv")
tmp2<-read.csv("/home/cohara/RainfallAnalysis/Kendall_p-value_for_10%_increase_over_81_-_89_years.csv")
tmp<-cbind(tmp,tmp2)
tmp3<-read.csv("/home/cohara/RainfallAnalysis/Kendall_tau_for_10%_increase_over_81_to_89_years.csv")
tmp4<-read.csv("/home/cohara/RainfallAnalysis/Kendall_tau_for_10%_increase_over_81_-_89_years.csv")
tmp3<-cbind(tmp3,tmp4)
write.table(tmp,"/home/cohara/RainfallAnalysis/Kendall_p-value_for_10%_increase_over_81_to_89_years.csv",sep="\t",row.names=FALSE)
write.table(tmp3,"/home/cohara/RainfallAnalysis/Kendall_tau_for_10%_increase_over_81_to_89_years.csv",sep="\t",row.names=FALSE)

The output looks like this, from the .csv files created:

X0.0190228056162596 X0.000701081415172666
0.0395622998    0.00531819
0.0126547674    0.0108218994
0.0077754743    0.0015568719
0.0001407317    0.002680057
0.0096391216    0.012719159
0.0107234037    0.0092436085
0.0503448173    0.0103918528
0.0167525802    0.0025036721

I want to be able to use excel functions on the data, so, for simplicity, I don't want row names (I'll be running this loop maybe a hundred times), but I need column names because otherwise the first set of values is cut off.

Can anyone tell me where the 'X' is coming from and how to get rid of it?

Thanks in advance, Ciara

Solución

Here is what I think is going on. Start by running these small examples:

df1 <- read.csv(text = "0.0190228056162596, 0.000701081415172666
0.0395622998,    0.00531819
0.0126547674,    0.0108218994")

df2 <- read.csv(text = "0.0190228056162596, 0.000701081415172666
0.0395622998,    0.00531819
0.0126547674,    0.0108218994", header = FALSE)

df1
df2
str(df1)
str(df2)
names(df1)
names(df2)

make.names(c(0.0190228056162596, 0.000701081415172666))

Please read ?read.csv and about the header argument. As you will find, header = TRUE is default in read.csv. Thus, if the csv file you read lacks header, read.csv will still 'assume' that the file has a header, and use the values in the first row as a header. Another argument in read.csv is check.names, which defaults to TRUE:
If TRUE then the names of the variables in the data frame are checked to ensure that they are syntactically valid variable names. If necessary they are adjusted (by make.names).

In your case, it seems that the data you read lack a header and that the first row is numbers only. read.csv will default treat this row as a header. make.names takes values in the first row (here numbers 0.0190228056162596, 0.000701081415172666), and spits out the 'syntactically valid variable names' X0.0190228056162596 and X0.000701081415172666. Which is not what you want.

Thus, you need to explicitly set header = FALSE to avoid that read.csvconvert the first row to (valid) variable names.

For next time, please provide a minimal, self contained example. Check these links for general ideas, and how to do it in R: here, here, here, and here

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow