Mongoimport location from independent latitude and longitude columns in a CSV

https://stackoverflow.com/questions/9935868

27-05-2021
|

Pergunta

I have a CSV with 3 fields: name, latitude, longitude. A row of looks like this:

Place 1,73.992964,40.739037

What's the correct way to mongoimport latitude and longitude into a loc field? I understand that location indexed fields need to be longitude,latitude and are a single array instead of 2 discrete fields for latitude and longitude but I'm missing if there is a way to handle going from discreet values to the array via mongoimport

Do I need to first transform to a CSV with a single column loc with longitude and latitude in it instead?

Place1,[-73.992964,40.739037]

I'm frequently going to be dealing with CSVs that have latitude and longitude stored in independent columns so I'm hoping to find a way to do this with mongoimport.

Solução

Mongoimport has very limited capabilities, and in situations such as this, the Official recommendation is to write a custom script that parses your csv file line by line and creates documents exactly the way that you would like them represented.

In order for a geospatial index to be created, the location information must be stored under the same key, as described in the "Some examples:" section at the top of the Geospatial Indexing documentation: http://www.mongodb.org/display/DOCS/Geospatial+Indexing

Importing the data directly from the .csv file creates documents like this:

doc1.csv:
place, lat, lon
Place 1,73.992964,40.739037

$ ./mongoimport -d test -c a --type csv --headerline --file doc1.csv 

> db.a.find()
{ "_id" : ObjectId("4f7602d70c873ff911798fd3"), "place" : "Place 1", "lat" : 73.992964, "lon" : 40.739037 }

Unfortunately, it is not possible to create a geospatial index on the document above.

By way of experimentation, I attempted to import a .csv file with data in the second format that you described, without success.

doc2.csv:
place, loc
Place1,[-73.992964,40.739037]

$ ./mongoimport -d test -c b --type csv --headerline --file doc2.csv 

> db.b.find()
{ "_id" : ObjectId("4f7602e40c873ff911798fd4"), "place" : "Place1", "loc" : "[-73.992964", "field2" : "40.739037]" }

As a further experiment, I changed the .csv document into json format, and imported that, and it seemed to work.

doc3.json:
{name:"Place1" , loc:[-73.992964,40.739037]}

$ ./mongoimport -d test -c c --type json --file doc3.json 

> db.c.find()
{ "_id" : ObjectId("4f7604570c873ff911798fd5"), "name" : "Place1", "loc" : [ -73.992964, 40.739037 ] }

However, if you are writing a script to convert all of your .csv files into .json format, you are probably better off writing a custom script to import your .csv files directly into your collection instead.

Outras dicas

I experienced a similar problem, and I solved it by performing a short preprocessing pass using sed to convert the CSV into a suitable JSON format (also using the new GeoJSON objects):

sed 's/\([^,]*\),\([0-9.-]*\),\([0-9.-]*\)/{ place: \1, location:{ type: "Point", coordinates: [ \3, \2 ] } }/' <data.csv >data.json

An explanation of what's going on:

sed          // Execute the sed command
's/          // Use substitute mode

\([^,]*\)    // Match a string containing anything except a ',' [1]
,            // Match a single ',' (the separator)
\([0-9.-]*\) // Match any combination of numbers, '.' or '-' [2]
,            // Match a single ',' (the separator)
\([0-9.-]*\) // Match any combination of numbers, '.' or '-' [3]

/{ place: \1, location:{ type: "Point", coordinates: [ \3, \2 ] } }/'
// Replace the match with the appropriate JSON format, inserting
// parts of the matched pattern ([1],[2],[3])

<data.csv    // Perform the command on the contents of the data.csv file
>data.json   // Output the results to a data.json file

I found that sed is pretty efficient, and even with a csv file containing ~8 million lines, it only took around a minute to perform this conversion.

It is then a simple task to import the newly created JSON file using mongoimport, as shown in Marc's answer.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow