So i am parsing a large csv file and pushing the results into mongo.

The file is maxminds city database. It has all kinds of fun utf8 characters. I am still getting (?) symbols in some city names. Here is how I am reading the file:

(using csv node module)

csv().from.stream(fs.createReadStream(path.join(__dirname, 'datafiles', 'cities.csv'), {
    flags: 'r',
    encoding: 'utf8'
})).on('record', function(row,index){
.. uninteresting code to add it to mongodb
});

What could i be doing wrong here? I am getting things like this in mongo: Ch�teauguay, Canada

EDIT:

i tried using a different lib to read the file:

lazy(fs.createReadStream(path.join(__dirname, 'datafiles', 'cities.csv'), {
    flags: 'r',
    encoding: 'utf8',
    autoClose: true
  }))
    .lines
    .map(String)
    .skip(1) // skips the two lines that are iptables header
    .map(function (line) {
      console.log(line);
    });

it produces the same bad results: 154252,"PA","03","Capellan�a","",8.3000,-80.5500,, 154220,"AR","01","Villa Espa�a","",-34.7667,-58.2000,,

有帮助吗?

解决方案

turns out maxmind encodes their stuff in latin1.

this works:

  var iconv  = require('iconv-lite')
  lazy(fs.createReadStream(path.join(__dirname, 'datafiles', 'cities.csv')))
    .lines
    .map(function(byteArray) {
      return iconv.decode(byteArray, 'latin1');
    })
    .skip(1) // skips the two lines that are iptables header
    .map(function (line) {
   //WORKS
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top