Question

I'm writing a library in Java which creates the URL from a list of filenames in this way:

final String domain = "http://www.example.com/";

String filenames[] = {"Normal text","Ich weiß nicht", "L'ho inserito tra i princìpi"};

System.out.println(domain+normalize(filenames[0]);
//Prints  "http://www.example.com/Normal_text"
System.out.println(domain+normalize(filenames[1]);
//Prints  "http://www.example.com/Ich_weib_nicht"
System.out.println(domain+normalize(filenames[2]);
//Prints  "http://www.example.com/L_ho_inserito_tra_i_principi"

Exists somewhere a Java library that exposes the method normalize that I'm using in the code above?

Literature:

Was it helpful?

Solution

Taking the content from my previous answer here, you can use java.text.Normalizer which comes close to normalizing Strings in Java. An example of normalization would be;

Accent removal:

String accented = "árvíztűrő tükörfúrógép";
String normalized = Normalizer.normalize(accented,  Normalizer.Form.NFD);
normalized = normalized.replaceAll("[^\\p{ASCII}]", "");

System.out.println(normalized);

Gives;

arvizturo tukorfurogep

OTHER TIPS

Assuming you mean you want to encode the strings to make them safe for the url. In which case use URLEncoder:

final String domain = "http://www.example.com/";

String filenames[] = {"Normal text","Ich weiß nicht", "L'ho inserito tra i princìpi"};

System.out.println(domain + URLEncoder.encode(filenames[0], "UTF-8"));
System.out.println(domain + URLEncoder.encode(filenames[1], "UTF-8"));
System.out.println(domain + URLEncoder.encode(filenames[2], "UTF-8"));
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top