Pergunta

I have a collection in mongo in which I am inserting data from Perl (using MongoDb) and with mongoinsert. The problem is that the data types for numbers become inconsistent.

For example, from Perl, I do:

$collection->insert({ _id => 1, value => "record 1" });

and I have a JSON file that I import with mongoimport, that contains this line:

{"_id":2,"value":"record 2"}

Now, if I do a search on the collection, I get the following:

> db.test.find()
{ "_id" : NumberLong(1), "value" : "record 1" }
{ "_id" : 2, "value" : "record 2" }

Is there a way to force the Perl driver to insert the _id as a 32-bit number? or to force mongoimport, to write it as a 64-bit (NumberLong)?

Would you have other suggestions on how to keep the _id field consistent?

Foi útil?

Solução

The MongoDB Perl module documentation has some information on 64-bit integers: http://search.cpan.org/dist/MongoDB/lib/MongoDB/DataTypes.pod#64-bit_Platforms

The integer size difference is dependent on the language and driver you are using; dynamically typed languages like Perl, PHP, and Python will use 64-bit integers if compiled for 64-bit, and 32-bit integers if compiled for 32-bit. Statically typed languages like Java may be more specific (Int is always 32-bit) but some languages, like C, only guarantee that an int is at least 16-bits (and a long is bigger than an int and at least 32-bits).

The NumberLong _ids you are seeing in the shell query are expected because you are inserting data using 64-bit Perl (which uses 64-bit integers). The 32-bit integers are actually in NumberInt format, but the shell doesn't explicitly display the type for these.

As far as indexing and querying applies, numeric _ids will still need to be unique.

For example, trying to insert the same integer _id as both 32 and 64 bit will cause a duplicate key error:

MongoDB shell version: 2.0.6
>     db.ints.insert({ _id: NumberInt(1) });
>     db.ints.insert({ _id: NumberLong(1) });
E11000 duplicate key error index: testing.ints.$_id_  dup key: { : 1 }

Similarily, when querying the numeric _ids will match:

>     db.ints.insert({ _id: NumberLong(2) });
>     db.ints.find({_id:Number(2)});
{ "_id" : NumberLong(2) }

>     db.ints.insert({ _id: NumberLong(3) });
>     db.ints.find({_id:Number(3)});
{ "_id" : NumberLong(3) }

If you're concerned about the difference in integer sizes using different drivers or commands such as mongoimport, you could write your own import script. The equivalent of mongoimport could quickly be put together using Text::CSV_XS.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top