Вопрос

I have some data in JSON I am trying to use in R. My problem is I cannot get the data in the right format.

require(RJSONIO)

json <- "[{\"ID\":\"id1\",\"VALUE\":\"15\"},{\"ID\":\"id2\",\"VALUE\":\"10\"}]"
example <- fromJSON(json)

example <- do.call(rbind,example)
example <- as.data.frame(example,stringsAsFactors=FALSE)

> example
   ID VALUE
1 id1    15
2 id2    10

This gets close, but I cannot get the numeric column to convert to numeric. I know I can convert columns manually, but I thought data.frame or as.data.frame scanned the data and made the most appropriate class definitions. Clearly I misunderstood. I am reading in numerous tables - all very different - and I need to have the numeric data treated as such when it's numeric.

Ultimately I am looking to get data tables with numeric columns when the data is numeric.

Это было полезно?

Решение

read.table uses type.convert to convert data to the appropriate type. You could do the same as a cleaning step after reading in the JSON data.

sapply(example,class)
         # ID       VALUE 
# "character" "character" 
example[] <- lapply(example, type.convert, as.is = TRUE)
sapply(example, class)
         # ID       VALUE 
# "character"   "integer" 

Другие советы

I would recommend that you use the jsonlite package, which would convert this to a data frame by default

jsonlite::fromJSON(json)

   ID VALUE
1 id1    15
2 id2    10

NOTE: The numeric problem still remains since json does not have data types encoded. So you will have to manually convert numeric columns.

Just to follow-up to Ramnath's suggestion to transition to jsonlite I did some benchmarking of the two approaches:

##RJSONIO vs. jsonlite for a simple example

require(RJSONIO)
require(jsonlite)
require(microbenchmark)

json <- "{\"ID\":\"id1\",\"VALUE\":\"15\"},{\"ID\":\"id2\",\"VALUE\":\"10\"}"
test <- rep(json,1000)
test <- paste(test,collapse=",")
test <- paste0("[",test,"]")

func1 <- function(x){
  temp <- jsonlite::fromJSON(x)
}

func2 <- function(x){
  temp <- RJSONIO::fromJSON(x)
  temp <- do.call(rbind,temp)
  temp <- as.data.frame(temp,stringsAsFactors=FALSE)
}

> microbenchmark(func1(test),func2(test))
Unit: milliseconds
       expr       min        lq    median        uq       max neval
func1(test) 204.05228 221.46047 233.93321 246.90815 341.95684   100
func2(test)  21.60289  22.36368  22.70935  23.75409  27.41851   100

At least for now, and I know the jsonlite package is still new and focusing on accuracy over performance, the older RJSONIO is performing faster for this simple example - even with transforming the list into a data frame.

Update including rjson:

require(rjson)

func3 <- function(x){
  temp <- rjson::fromJSON(x)
  temp <- do.call(rbind,lapply(temp,unlist))
  temp <- as.data.frame(temp,stringsAsFactors=FALSE)
}

> microbenchmark(func1(test),func2(test),func3(test))
Unit: milliseconds
       expr       min        lq    median        uq       max neval
func1(test) 205.34603 220.85428 234.79492 249.87628 323.96853   100
func2(test)  21.76972  22.67311  23.11287  23.56642  32.97469   100
func3(test)  14.16942  15.96937  17.29122  20.19562  35.63004   100

> microbenchmark(func1(test),func2(test),func3(test),times=500)
Unit: milliseconds
       expr       min        lq    median        uq       max neval
func1(test) 206.48986 225.70693 241.16301 253.83269 336.88535   500
func2(test)  21.75367  22.53256  23.06782  23.93026 103.70623   500
func3(test)  14.21577  15.61421  16.86046  19.27347  95.13606   500

> identical(func1(test),func2(test)) & identical(func1(test),func3(test))
[1] TRUE

At least on my machine rjson is only slightly faster, although I did not test how it scales compared to RJSONIO which may be where it gets the big performance bump Ramnath suggested.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top