Domanda

I'm trying to import and merge multiple CSVs into mongo, however documents are getting replaced rather than merged.

For example, if I have one.csv:

key1, first column, second column

and two.csv:

key1, third column

I would like to end up with:

key1, first column, second column, third column

But instead I'm getting:

key1,third column

Currently I'm using:

mongoimport.exe --ftype csv --file first.csv --fields key,firstColumn,secondColumn
mongoimport.exe --ftype csv --file second.csv --fields key,thirdColumn --upsert --upsertFields key1
È stato utile?

Soluzione

That's the way mongoimport works. There's an existing new feature request for merge imports, but for now, you'll have to write your own import to provide merge behavior.

Altri suggerimenti

cross-collection workaround: forEach method can be run on a dummy collection and the resulting doc objects used to search/update your desired collection:

mongoimport.exe --collection mycoll --ftype csv --file first.csv --fields key,firstColumn,secondColumn
mongoimport.exe --collection dummy --ftype csv --file second.csv --fields key,third

db.dummy.find().forEach(function(doc) {db.mycoll.update({key:doc.key},{$set:{thirdcol:doc.third}})})

That's correct, mongoimport --upsert updates full documents. You may achieve your goal by importing to a temporary collection and using the following Gist.

Load the script to Mongo Shell and run:

mergeCollections("srcCollectionName", "destCollectionName", {}, ["thirdColl"]); 

I just had a very similar problem. There is a node module for mongo and jline is my command line node tool for stream processing JSON lines. So:

echo '{"page":"index.html","hour":"2015-09-18T21:00:00Z","visitors":1001}' |\
jline-foreach \
    'beg::dp=require("bluebird").promisifyAll(require("mongodb").MongoClient).connectAsync("mongodb://localhost:27017/nginx")' \
    'dp.then(function(db){
       updates = {}
       updates["visitors.hour."+record.hour] = record.visitors;
       db.collection("pagestats").update({_id:record.page},{$set:updates},{upsert:true});});' \
    'end::dp.then(function(db){db.close()})'

In your case you'd have to convert from csv to JSON lines first by piping it through jline-csv2jl. That converts each CSV line into a dictionary with names taken from the header.

I have added this example to the manual: https://github.com/bitdivine/jline/blob/master/bin/foreach.md

I haven't used jline with promises much but so far it's OK.

Disclaimer: I am the author of jline.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top