Question

Is there a default/easy way in Java for split strings, but taking care of quotation marks or other symbols?

For example, given this text:

There's "a man" that live next door 'in my neighborhood', "and he gets me down..."

Obtain:

There's
a man
that
live
next
door
in my neighborhood
and he gets me down
Was it helpful?

Solution

Something like this works for your input:

    String text = "There's \"a man\" that live next door "
        + "'in my neighborhood', \"and he gets me down...\"";

    Scanner sc = new Scanner(text);
    Pattern pattern = Pattern.compile(
        "\"[^\"]*\"" +
        "|'[^']*'" +
        "|[A-Za-z']+"
    );
    String token;
    while ((token = sc.findInLine(pattern)) != null) {
        System.out.println("[" + token + "]");
    }

The above prints (as seen on ideone.com):

[There's]
["a man"]
[that]
[live]
[next]
[door]
['in my neighborhood']
["and he gets me down..."]

It uses Scanner.findInLine, where the regex pattern is one of:

"[^"]*"      # double quoted token
'[^']*'      # single quoted token
[A-Za-z']+   # everything else

No doubt this doesn't work 100% always; cases where quotes can be nested etc will be tricky.

References

OTHER TIPS

Doubtful based on your logic, you have differentiation between an apostrophe and single quotes, i.e. There's and in my neighborhood

You'd have to develop some kind of pairing logic if you wanted what you have above. I'm thinking regular expressions. Or some kind of two part parse.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top