Question

I need to remove all punctuation from words in java i tried this

    System.out.println("do.,it".replaceAll("[^\\w]", ""));
    System.out.println("сказочники".replaceAll("[^\\w]", ""));

But it won't work with kyrillic or other languages. I already tried to work with

\p{Punct}

But the list is not complete, for example

„ and »

Are missing

Was it helpful?

Solution

Not sure if java supports this, but have a try with:

"сказочники".replaceAll("\P{wd}+", "")

where \P{wd} stands for any non-word character in any language. It is the opposite of \p{wd}

OTHER TIPS

Try with this regex.

 text = text.replaceAll("[^a-zA-Z0-9\\s]", "");

This will remove all special characters except space.

Edit:

As this is a different language.

Suppose you have to remove - + ^ . : ,

Try this, text = text.replaceAll("[\\-\\+\\.\\^:,]","");

my solution seems to be

System.out.println("сказ очники»»«„“‚‘›‹".replaceAll("[^\\p{L}]", ""));
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top