Domanda

I have some data in JSON I am trying to use in R. My problem is I cannot get the data in the right format.

require(RJSONIO)

json <- "[{\"ID\":\"id1\",\"VALUE\":\"15\"},{\"ID\":\"id2\",\"VALUE\":\"10\"}]"
example <- fromJSON(json)

example <- do.call(rbind,example)
example <- as.data.frame(example,stringsAsFactors=FALSE)

> example
   ID VALUE
1 id1    15
2 id2    10

This gets close, but I cannot get the numeric column to convert to numeric. I know I can convert columns manually, but I thought data.frame or as.data.frame scanned the data and made the most appropriate class definitions. Clearly I misunderstood. I am reading in numerous tables - all very different - and I need to have the numeric data treated as such when it's numeric.

Ultimately I am looking to get data tables with numeric columns when the data is numeric.

È stato utile?

Soluzione

read.table uses type.convert to convert data to the appropriate type. You could do the same as a cleaning step after reading in the JSON data.

sapply(example,class)
         # ID       VALUE 
# "character" "character" 
example[] <- lapply(example, type.convert, as.is = TRUE)
sapply(example, class)
         # ID       VALUE 
# "character"   "integer" 

Altri suggerimenti

I would recommend that you use the jsonlite package, which would convert this to a data frame by default

jsonlite::fromJSON(json)

   ID VALUE
1 id1    15
2 id2    10

NOTE: The numeric problem still remains since json does not have data types encoded. So you will have to manually convert numeric columns.

Just to follow-up to Ramnath's suggestion to transition to jsonlite I did some benchmarking of the two approaches:

##RJSONIO vs. jsonlite for a simple example

require(RJSONIO)
require(jsonlite)
require(microbenchmark)

json <- "{\"ID\":\"id1\",\"VALUE\":\"15\"},{\"ID\":\"id2\",\"VALUE\":\"10\"}"
test <- rep(json,1000)
test <- paste(test,collapse=",")
test <- paste0("[",test,"]")

func1 <- function(x){
  temp <- jsonlite::fromJSON(x)
}

func2 <- function(x){
  temp <- RJSONIO::fromJSON(x)
  temp <- do.call(rbind,temp)
  temp <- as.data.frame(temp,stringsAsFactors=FALSE)
}

> microbenchmark(func1(test),func2(test))
Unit: milliseconds
       expr       min        lq    median        uq       max neval
func1(test) 204.05228 221.46047 233.93321 246.90815 341.95684   100
func2(test)  21.60289  22.36368  22.70935  23.75409  27.41851   100

At least for now, and I know the jsonlite package is still new and focusing on accuracy over performance, the older RJSONIO is performing faster for this simple example - even with transforming the list into a data frame.

Update including rjson:

require(rjson)

func3 <- function(x){
  temp <- rjson::fromJSON(x)
  temp <- do.call(rbind,lapply(temp,unlist))
  temp <- as.data.frame(temp,stringsAsFactors=FALSE)
}

> microbenchmark(func1(test),func2(test),func3(test))
Unit: milliseconds
       expr       min        lq    median        uq       max neval
func1(test) 205.34603 220.85428 234.79492 249.87628 323.96853   100
func2(test)  21.76972  22.67311  23.11287  23.56642  32.97469   100
func3(test)  14.16942  15.96937  17.29122  20.19562  35.63004   100

> microbenchmark(func1(test),func2(test),func3(test),times=500)
Unit: milliseconds
       expr       min        lq    median        uq       max neval
func1(test) 206.48986 225.70693 241.16301 253.83269 336.88535   500
func2(test)  21.75367  22.53256  23.06782  23.93026 103.70623   500
func3(test)  14.21577  15.61421  16.86046  19.27347  95.13606   500

> identical(func1(test),func2(test)) & identical(func1(test),func3(test))
[1] TRUE

At least on my machine rjson is only slightly faster, although I did not test how it scales compared to RJSONIO which may be where it gets the big performance bump Ramnath suggested.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top