Question

Given a directory of filenames consisting of double-byte/full-width numbers and spaces (along with some half-width numbers and underscores), how can I convert all of the numbers and spaces to single-byte characters?

For example, this filename consists of a double-byte number, followed by a double-byte space, followed by some single-byte characters:

2 2_3.ext

and I'd like to change it to all single-byte like so:

2 2_3.ext

I've tried convmv to convert from utf8 to ascii, but the following message appears for all files:

"ascii doesn't cover all needed characters for: filename"

Was it helpful?

Solution 3

Thanks for your quick replies, bmargulies and bobince. I found a Perl module, Unicode::Japanese, that helped get the job done. Here is a bash script I made (with help from this example) to convert filenames in the current directory from full-width to half-width characters:

#!/bin/bash
for file in *;do
newfile=$(echo $file | perl -MUnicode::Japanese -e'print Unicode::Japanese->new(<>)->z2h->get;')
test "$file" != "$newfile" && mv "$file" "$newfile"
done

OTHER TIPS

You need either (1) normalization from Java 1.6 (java.text.Normalizer), or (2) ICU, or (3 (unlikely)) a product sold by the place I work.

What tools do you have available? There are Unicode normalisation functions in several scripting languages, for example in Python:

for child in os.listdir(u'.'):
    normal= unicodedata.normalize('NFKC', child)
    if normal!=child:
        os.rename(child, normal)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top