Question

I've found a howto, http://answers.oreilly.com/topic/214-how-to-match-nonprintable-characters-with-a-regular-expression/ , but non of the codes, \e, \x1b, \x1B, work for me in Java.

EDIT

I am trying to replace the ANSI escape sequences (specifically, color sequences) of a Linux terminal command's output. In Python the replace pattern would look like "\x1b[34;01m", which means blue bold text. This same pattern does not work in Java. I tried to replace "[34;01m" separately, and it worked, so the problem is \x1b. And I am doing the "[" escaping using Pattern.quote().

EDIT

Map<String,String> escapeMap = new HashMap<String,String>();
escapeMap.put("\\x1b[01;34m", "</span><span style=\"color:blue;font-weight:bold\">");
FileInputStream stream = new FileInputStream(new File("/home/ch00k/gun.output"));
FileChannel fc = stream.getChannel();
MappedByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
String message = Charset.defaultCharset().decode(bb).toString();
stream.close();
String patternString = Pattern.quote(StringUtils.join(escapeMap.keySet(), "|"));
System.out.println(patternString);
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(message);
StringBuffer sb = new StringBuffer();
while(matcher.find()) {
    matcher.appendReplacement(sb, escapeMap.get(matcher.group()));
    }
matcher.appendTail(sb);
String formattedMessage = sb.toString();
System.out.println(formattedMessage);

EDIT Here is the code I've ended up with:

import java.io.*;
import java.nio.*;
import java.nio.channels.*;
import java.nio.charset.*;
import java.util.*;
import java.util.regex.*;
import org.apache.commons.lang3.*;

class CreateMessage {
    public static void message() throws IOException {
        FileInputStream stream = new FileInputStream(new File("./gun.output"));
        FileChannel fc = stream.getChannel();
        MappedByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
        String message = Charset.defaultCharset().decode(bb).toString();
        stream.close();
        Map<String,String> tokens = new HashMap<String,String>();
        tokens.put("root", "nobody");
        tokens.put(Pattern.quote("[01;34m"), "qwe");
        String patternString = "(" + StringUtils.join(tokens.keySet(), "|") + ")";
        Pattern pattern = Pattern.compile(patternString);
        Matcher matcher = pattern.matcher(message);
        StringBuffer sb = new StringBuffer();
        while(matcher.find()) {
            System.out.println(tokens.get(matcher.group()));
            matcher.appendReplacement(sb, tokens.get(matcher.group()));
        }
        matcher.appendTail(sb);

        System.out.println(sb.toString());

    }
}

The file gun.output contains the output of ls -la --color=always / Now, the problem is that I'm getting a NullPointerException if I'm trying to match Pattern.quote("[01;34m"). Everything matches fine except of the strings, that contain [, even though I quote them. The exception is the following:

Exception in thread "main" java.lang.NullPointerException
    at java.util.regex.Matcher.appendReplacement(Matcher.java:699)
    at org.minuteware.jgun.CreateMessage.message(CreateMessage.java:32)
    at org.minuteware.jgun.Main.main(Main.java:23)

EDIT

So, according to http://java.sun.com/developer/technicalArticles/releases/1.4regex/, the escape character should be matched with "\u001B", which indeed works in my case. The problem is, if I use tokens.put("\u001B" + Pattern.quote("[01;34m"), "qwe");, I still get the above mentioned NPE.

Was it helpful?

Solution 3

The proper value for "escape" character in a regexp is \u001B

OTHER TIPS

quote() is to make a pattern that will match the input string verbatim. Your string has pattern language in it. Look at the output from quote() - you'll see that it's trying to literally find the four characters \x1b.

The ansi escape sequences are of the following form [\033[34;01m]

where \033 is ANSI character 033 (oct) or 1b in Hex or 27 in decimal. You need to use the following regexp:

Pattern p = Pattern.compile("\033\\[34;01m");

You can use an octal (\033) or hex (\x1b) representation when you're using a non-printable character in a java string.

FWIW, I've been working on stripping ANSI color codes from colorized log4j files and this little pattern seems to do the trick for all of the cases I've come across:

Pattern.compile("(\\u001B\\[\\d+;\\d+m)+")
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top