Why does this use, in Java, of regular expressions throw an "Unclosed character class" exception at runtime?

StackOverflow https://stackoverflow.com/questions/23485608

  •  16-07-2023
  •  | 
  •  

Domanda

I have list of keywords:

String[] keywords = {"xxxx", "yyyy", "zzzz"};
String[] another = {"aaa", "bbb", "ccc"};

I am trying to identify text that has one of the keywords followed by a space and then followed by one of the "another" words.

if I use:

Pattern pattern = Pattern.compile(keywords+"\\s"+another);

This throws an exception at runtime:

Exception in thread "main" java.util.regex.PatternSyntaxException: Unclosed character class near index 57
[Ljava.lang.String;@3dd4ab05\s[Ljava.lang.String;@5527f4f9
                                                         ^

How can I fix this?

È stato utile?

Soluzione

That error is correctly telling you that the pattern you're trying to create is invalid. The gibberish looking string starting with [Ljava is the string you passed to Pattern.compile().

Java Arrays unfortunately do not have very informative .toString() output, and what you're doing here is essentially concatenating two arrays as Strings, which Pattern cannot hope to parse correctly.

But even if you called Arrays.toString(), you'd still not get what you're looking for:

Pattern pattern=Pattern.compile(Arrays.toString(keywords)+"\\s"+
                                Arrays.toString(another));
System.out.println(pattern.pattern());
[xxxx, yyyy, zzzz]\s[aaa, bbb, ccc]

This is a technically valid, but essentially meaningless regular expression, which will only match three-character Strings starting with one character from xyz , followed by one whitespace character, followed by one character from abc ,.

I would suggest reading more about how regular expressions work; there's lots of resources online to help, and a good starting point is the Java Regular Expressions lesson, and the Pattern documentation - you won't get very far until you understand what regular expressions are trying to do.

As a starting point however, a regular expression that matches one of several words, followed by a space, followed by one of several other words, might look like this:

(?:xxxx|yyyy|zzzz)\s(?:aaa|bbb|ccc)

This uses "non-capturing groups" and the logical OR operator | to specify multiple potential matches.

Altri suggerimenti

[Ljava.lang.String;@3dd4ab05 is the result of calling toString() on a string array.

You need to build your pattern manually with the items that are in the relevant arrays.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top