Saving entries of Mongo documents into CSV + Formatting ISODate

https://stackoverflow.com/questions/14261053

14-01-2022
|

Pergunta

I have data in an mongo collection called "hello". The documents look like:

{ 
name: ..., 
size: ..., 
timestamp: ISODate("2013-01-09T21:04:12Z"), 
data: { text:..., place:...},
other: ...
}

I would like to export the timestamp and the text from each document into a CSV File, with first column the Timestamp and second column the text.

I tried creating a new collection (hello2) where the documents only have the timestamp and the text.

data = db.hello
for i in data:
    try:
        connection.me.hello2.insert(i["data"]["text"], i["timestamp"])
    except:
        print "Unable", sys.exc_info()

I then wanted to use mongoexport:

mongoexport --db me --collection hello2 --csv --out /Dropbox/me/hello2.csv

But this is not working and I do not know how to proceed.

PS: I would also like to store only the time of the ISODate in the CSV File, i.e. just 21:04:12 instead of ISODate("2013-01-09T21:04:12Z")

Thank you for your help.

Solução

You can export right from the data collection, no need for a temporary collection:

for r in db.hello.find(fields=['text', 'timestamp']):
     print '"%s","%s"' % (r['text'], r['timestamp'].strftime('%H:%M:%S'))

or to write to a file:

with open(output, 'w') as fp:
   for r in db.hello.find(fields=['text', 'timestamp']):
       print >>fp, '"%s","%s"' % (r['text'], r['timestamp'].strftime('%H:%M:%S'))

To filter out duplicates and print only most recent ones, the process should be split in two steps. First, accumulate data in a dictionary:

recs = {}
for r in d.foo.find(fields=['data', 'timestamp']):
    text, time = r['data']['text'], r['timestamp']
    if text not in recs or recs[text] < time:
        recs[text] = time

and then output the dictionary content:

for text, time in recs.items():
    print '"%s","%s"' % (text, time.strftime('%H:%M:%S'))

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow