Question

I'm trying to use mongoimport to upsert data with string values in _id. Since the ids look like integers (even though they're in quotes), mongoimport treats them as integers and creates new records instead of upserting the existing records.

Command I'm running:

mongoimport --host localhost --db database --collection my_collection --type csv --file mydata.csv --headerline --upsert

Example data in mydata.csv:

{ "_id" : "0364", someField: "value" }

The result would be for mongo to insert a record like this: { "_id" : 364, someField: "value" } instead of updating the record with _id "0364".

Does anyone know how to make it treat the _id as strings?

Things that don't work:

  • Surrounding the data with double double quotes ""0364"", double and single quotes "'0364'" or '"0364"'
  • Appending empty string to value: { "_id" : "0364" + "", someField: "value" }
Was it helpful?

Solution

Unfortunately there is not now a way to force number-like strings to be interpreted as strings:

https://jira.mongodb.org/browse/SERVER-3731

You could write a script in Python or some other language with which you're comfortable, along the lines of:

import csv, pymongo

connection = pymongo.Connection()
collection = connection.mydatabase.mycollection
reader = csv.DictReader(open('myfile.csv'))
for line in reader:
    print '_id', line['_id']
    upsert_fields = {
        '_id': line['_id'],
        'my_other_upsert_field': line['my_other_upsert_field']}

    collection.update(upsert_fields, line, upsert=True, safe=True)

OTHER TIPS

Just encountered this same issue and discovered an alternative. You can force Mongo to use string types for non-string values by converting your CSV to JSON and quoting the field. For example, if your CSV looks like this:

key value
123 foo
abc bar

Then you'll get an integer field for key 123 and a string field for key abc. If you convert that to JSON, making sure that all the keys are quoted, and then use --type json when you import, you'll end up with the desired behavior:

{
    "123":"foo",
    "abc":"bar"
}

I was able to prefix the numeric string and that worked for me. Example:

00012345 was imported as 12345 (Type Int) string00012345 was imported as string00012345 (Type String)

My source was a SQL database so I just did

select 'string'+column as name

Of course, you also need to do a bit of post-processing to parse the string, but far less effort than converting a rather large tsv file to json.

I also added +1 to the jira link above for the enhancement.

As an alternative to @Jesse, you can do something similar in the mongo console, e.g.

db.my_collection.find().forEach(function (obj) {
  db.my_collection.remove({_id: obj._id); // remove the old one
  obj._id = '' + obj._id; // change to string
  db.my_collection.save(obj); // resave
});

For non _id fields you can simply do:

db.my_collection.find().forEach(function (obj) {
  obj.someField = '' + obj.someField; // change to string
  db.my_collection.save(obj); // resave
});

I encountered the same issue.

I feel the simplest way is to convert the CSV file to a JSON file using an online tool and then import.

This is the tool I used:

http://www.convertcsv.com/csv-to-json.htm

It lets you wrap the integer values of your CSV file in double quotes for your JSON file.

If you have trouble importing this JSON file and encountering an error, just add --jsonArray to your import command. It will work for sure.

mongoimport --host localhost --db mydb -c mycollection --type json --jsonArray --file <file_path>
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top