Import with xlsx package in R gives NA, <NA> and empty entries, can´t delete NA values

StackOverflow https://stackoverflow.com/questions/18334583

  •  25-06-2022
  •  | 
  •  

Question

I am importing data from an xlsx (https://www.dropbox.com/s/r5sn5pio5rnprdq/gesammelte%20Daten_1707.xlsx) file with read.xlsx

setwd("C:/***//Kultivierungen//1707_ADH//")
pH <- read.xlsx("gesammelte Daten_1707.xlsx", sheetName="pH")
OD <- read.xlsx("gesammelte Daten_1707.xlsx", sheetName="OD")
Glc <- read.xlsx("gesammelte Daten_1707.xlsx", sheetName="Glucose")
Ac <- read.xlsx("gesammelte Daten_1707.xlsx", sheetName="Acetate")

I want to delete the NA Values with

OD <- OD[rowSums(is.na(OD))==0,]
Glc <- Glc[rowSums(is.na(Glc))==0,]
Ac <- Ac[rowSums(is.na(Ac))==0,]
pH <- pH[rowSums(is.na(pH))==0,]

..which works fine for the OD and pH data, but not for Ac and Glc. The result before deleting the NA Values looks like this:

  time.in.h               SPL1 SPL1_Error               SPL2 SPL2_Error               SPL3 SPL3_Error
1  0.000000               <NA>       <NA>               <NA>       <NA>               <NA>       <NA>
2  1.502222               <NA>       <NA>               <NA>       <NA>               <NA>       <NA>
3  3.687778 0.0602636534839925       0.06 0.0502197112366604       0.09 0.0301318267419962       0.03
4 10.248889                                                                                          
5 16.248333  0.118460019743337       0.06 0.0829220138203356       0.12  0.106614017769003       0.18
6 21.653056 0.0644511581067472       0.03 0.0161127895266868       0.15 0.0483383685800604       0.12
7 29.653333                                                                                          
8 37.652778                                                                                          
9 43.391667  0.342347696879643       0.18  0.271025260029718       0.18  0.727488855869242       0.24

And after deleting the NA Values..:

  time.in.h               SPL1 SPL1_Error               SPL2 SPL2_Error               SPL3 SPL3_Error
3  3.687778 0.0602636534839925       0.06 0.0502197112366604       0.09 0.0301318267419962       0.03
4 10.248889                                                                                          
5 16.248333  0.118460019743337       0.06 0.0829220138203356       0.12  0.106614017769003       0.18
6 21.653056 0.0644511581067472       0.03 0.0161127895266868       0.15 0.0483383685800604       0.12
7 29.653333                                                                                          
8 37.652778                                                                                          
9 43.391667  0.342347696879643       0.18  0.271025260029718       0.18  0.727488855869242       0.24

str() returns the following:

> str(Glc)
'data.frame':   9 obs. of  17 variables:
 $ time.in.h : num  0 1.5 3.69 10.25 16.25 ...
 $ SPL1      : Factor w/ 5 levels "","0.0602636534839925",..: NA NA 2 1 4 3 1 1 5
 $ SPL1_Error: Factor w/ 4 levels "","0.03","0.06",..: NA NA 3 1 3 2 1 1 4
 $ SPL2      : Factor w/ 5 levels "","0.0161127895266868",..: NA NA 3 1 4 2 1 1 5
 $ SPL2_Error: Factor w/ 5 levels "","0.09","0.12",..: NA NA 2 1 3 4 1 1 5

It has worked fine before with a different set of data/xlsx file, I tried to rule out all format-issues in the xlsx file as well, but couldn´t find anything....anyone had this before?

Was it helpful?

Solution

It seems that the empty cells in the Glucose and Acetate-sheet are recognized as text, although I am not sure why (Excel is not really my expertise..).

When I replace the empty cells in a column in the xlsx-file with 0 and then I delete those 0's again read.xlsx does import it as numeric vector instead of a factor and assigns NA to the empty cells. Then, you can use data <- data[rowSums(is.na(data))==0,] to remove the rows that contain NA's.

Can't tell you what exactly is going on here, but the above solution seems to work.

OTHER TIPS

I'm not completely sure what's your problem but here's some fixes:

Do you want to replace NA values with zeros? In that case you could try:

Glc[is.na(Glc)] <- 0

Or do you want to delete the observations with NAs? In that case it seems to be working correctly for the first two observations. But e.g. the fourth observation seems to have something else (zero?) than NA in the xlsx-file.

I had the same problem and worked it out. In Excel, instead of delete use "Clear All" then save your Excel file and import it R. This way R does not show any N/A.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top