Question

I am using a Javascript file that is a concatenation of other JavaScript files.

Unfortunately, the person who concatenated these JavaScript files together did not use the proper encoding when reading the file, and allowed a BOM for every single JavaScript file to get written to the concatenated JavaScript file.

Does anyone know a simple way to search through the concatenated file and remove any/all BOM markers?

Using PHP or a bash script for Mac OSX would be great.

Was it helpful?

Solution

See also: Using awk to remove the Byte-order mark

To remove multiple BOMs from anywhere within a text file you can try something similar. Just leave out the ^ anchor:

perl -e 's/\xef\xbb\xbf//;' -pi~ file.js

(This edits the file in-place. But creates a backup file.js~.)

OTHER TIPS

I normally do it using vim:

vim -c "set nobomb" -c wq! myfile

fetch BOM files

grep -rIlo $’^\xEF\xBB\xBF’ ./

remove BOM files

grep -rIlo $’^\xEF\xBB\xBF’ . | xargs sed –in-place -e ‘s/\xef\xbb\xbf//’

exclude .svn dir

grep -rIlo –exclude-dir=”.svn” $’^\xEF\xBB\xBF’ . | xargs sed –in-place -e ‘s/\xef\xbb\xbf//’

I also figured out this solution which works entirely in PHP:

$packed = pack("CCC",0xef,0xbb,0xbf);
$contents = preg_replace('/'.$packed.'/','',$contents);

I have written a bash script see here that works for Mac, I haven't tested on other systems but I suspect it should work there as well. The script also support files or file paths that contains spaces.

Examples

Remove BOM from all files in current directory:

rmbom .

Print all files with a BOM in the current directory

rmbom . -a

Only remove BOM from all files in current directory with extension txt or cs:

rmbom . -e txt -e cs

Print help

rmbom -h

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top