Question

I was wondering if there is any way in Java to tokenize an string by spaces, BUT if there are some words between apostrophes, take it as "one word"...

so for example, if I have:

This "is a great" day

the string tokenizer should have:

  • "This"
  • "is a great"
  • "day"

Thanks!

Was it helpful?

Solution

Using String.split() and a regex, not a StringTokenizer, how about:

    String input = "this \"is a great\" day";

    for (String  word: input.split("(?<=\").+(?=\")|\\b\\w+\\b"))
    {
        System.out.println("["+word+"]");
    }

Output:

[this]
[is a great]
[day]

From your example, I assume you mean double-quotes (") not apostrophes (').

NB: I initially posted something much simpler, which worked for your example, but not for input like:

" yes this \"is a great\" day all right"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top