In Linux you could use the iconv
command as suggested in: How to remove non UTF-8 characters from text file
iconv -f utf8 -t utf8 -c file.txt
I'm not familiar with MongoDB, so I have no insight on how to preserve the invalid characters during import.