문제

Suppose I have the following JSON data:

{ "_id" : { "$oid" : "string" }, "titulo" : "string", "id_cv" : 1132, "textos" : [ { "fecha" : { "$date" : 1217376000000 }, "estado" : "string", "texto" : "string", "source_url" : "string" } ] }
{ "_id" : { "$oid" : "string" }, "titulo" : "string", "autores" : ",\"string\",\"string\",\"string\",\"string",5", "id_cv" : 1138, "textos" : [ { "fecha" : { "$date" : 1217548800000 }, "estado" : "string", "texto" : "string", "source_url" : "string" } ] }

I am attempting to import the JSON data in to R and transform it in to ultimately an R Data Frame.

Suppose I have the following script in R:

library("rjson")
json_file <- "/Users/usr/file/json_data.json"
json_data <- fromJSON(paste(readLines(json_file), collapse=""))

data = unlist(json_data)

title=data[names(data)=="titulo"]
print(title)

text=data[names(data)=="textos.texto"]
print(text)

url=data[names(data)=="textos.source_url"]
print(url)

When I run this script the JSON data only yields a data frame containing the first line of the JSON data file. I have approximately 200 lines. One of the issues that I am aware of is that JavaScript does not 'allow' multi-line strings. I have attempted to cope with this in various ways:

  1. Add '"' between each 'line' of data.
  2. Add '"' to the end of each 'line' of data.
  3. Add "\" between each 'line' of data.
  4. Add "\" to the end of each 'line' of data.
  5. Convert all multiple lines in to one line (replace "\n" with "\n")

All of the above have been attempted using regular expressions.

My question is: How do I manipulate the JSON data so that all the 'lines' of the data are being read in to R so that I may unlist them and construct the appropriate data frame with columns equal to 'title','text','url' and rows equal to the 'lines' from the JSON data?

I have attempted this using both the RJSON & RJSONIO libraries in R, but I am ambivalent about which one I use at the moment since I believe ultimately that the issue is with the formatting of the JSON data itself

도움이 되었습니까?

해결책

The JSON string itself was indeed not quite correct.

  1. There was a missing \ in one of the strings, so one quotationmark was not properly masked: "autores" : ",\"string\",\"string\",\"string\",\"string",5" should be "autores" : ",\"string\",\"string\",\"string\",\"string\",5"
  2. The individual {} objects (line 1 and 2, as you call them) must be combined in an upper structure, either an array ([]) or an object ({} with identifiers) since otherwise it is not clearly defined, how the JSON structure is to be interpreted.

I modified your JSON string to be composed of two array elements, each one containing a line (=one JSON object):

[{ "_id" : { "$oid" : "string" },
     "titulo" : "string",
     "id_cv"  : 1132, 
     "textos" : [ { "fecha" : { "$date" : 1217376000000 }, 
                               "estado" : "string", 
                               "texto"  : "string",
                           "source_url" : "string" } ] },

 { "_id" : { "$oid" : "string" },
     "titulo" : "string", 
     "autores" : ",\"string\",\"string\",\"string\",\"string\",5",
     "id_cv" : 1138,
     "textos" : [ { "fecha" : { "$date" : 1217548800000 },
                               "estado" : "string",
                                "texto" : "string", 
                           "source_url" : "string" } ] }]

I added newlines for better readability. newline characters and whitespace (outside individual identifiers or strings) are - or better: should be - ignored by JSON parsers.

다른 팁

Here I have parsed a JSON string to data frame. I think this will be useful for you.

http://spring-webservice-2-step-by-step.blogspot.in/2013/10/voltdb-with-r-real-time-analysis.html

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top