Question

I'd like to filter out some char who's unicode somekind like U+1F603 with java lauguage. They are all kinds of information in the string,only filter the specific char.How?Can any one help? Thanks.

Was it helpful?

Solution

U+1F603 character can be written as \uD83D\uDE03 in Java.

If you have your text in String yourString variable, following code should remove occurrences of the special character.

yourString.replace("\uD83D\uDE03", "");

OTHER TIPS

tl;dr

"happy😃".replaceAll( "😃" , "" ) 

happy

Java source code supports Unicode

Your Java source code may contain any of the over 140,000 characters defined in Unicode. So you can have a string literal using any such character. No need for escaping.

Before Java 18, you may need to instruct your tooling to save your source code files as UTF-8. In Java 18 and later, the default character encoding is UTF-8 across all platforms, per JEP 400: UTF-8 by Default.

Your target, U+1F603, is 😃, SMILING FACE WITH OPEN MOUTH.

String result = input.replaceAll( "😃" , "" ) ;  // Replacing target character with empty string, effectively a "remove all" operation. 

Example:

System.out.println(
    "happy😃".replaceAll( "😃" , "" ) 
);

See this code run live at IdeOne.com.

happy

Code points

To work with individual characters in Java, use code point integer numbers.

The Unicode Consortium has assigned a permanent identifier number to virtually every known character of every language and script. Currently the count of characters is over 140,000. The assigned numbers range from zero to just over a million. Obviously most of that number range is unassigned, reserved for private use or future use.

You want to remove 😃, U+1F603, SMILING FACE WITH OPEN MOUTH.

Say we also want to remove also:

  • 😦, U+1F626, FROWNING FACE WITH OPEN MOUTH
  • 😷, U+1F637, FACE WITH MEDICAL MASK

Make a list of those characters.

String forbidden = "😃😦😷";
List< Integer > forbiddenCodePoints = forbidden.codePoints().boxed().toList() ;

Get a stream of the code points assigned to each character of your input stream.

String input = "happy 😃 sad 😦";
IntStream codePoints = input.codePoints();

Filter those to eliminate any found in our list of forbidden code point numbers. For the code point int numbers that pass our test, append each to a StringBuilder. Lastly, build a String object from that StringBuilder.

String result =
        codePoints
                .filter( codePoint -> ! forbiddenCodePoints.contains( codePoint ) )
                .collect( StringBuilder :: new , StringBuilder :: appendCodePoint , StringBuilder :: append )
                .toString();

When run.

result = happy  sad 

Here is alternative, for a one-liner, using IntStream#anyMatch.

System.out.println(
        "happy 😃 sad 😦"
                .codePoints()
                .filter( codePoint -> ! "😃😦😷".codePoints().anyMatch(  x -> x == codePoint) )
                .collect( StringBuilder :: new , StringBuilder :: appendCodePoint , StringBuilder :: append )
                .toString()
);

When run.

happy  sad 
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top