String Tokenizing

Question 1

Try using positive look-behind, a regex construct that does not capture its input:

String s = "A#B^C&D!ased&acdf@Mhj%";
String temp[] = s.split("(?<=[#^&!@%]+)");

The (?<=expr) construct matches at the point right after expr, without capturing the expr itself, letting you split the text at positions that follow the separator.

Here is a demo on ideone.

Question 2

If you have to deal with very large Strings, you will be better off to roll your own code. The Java pattern matching engine is a good, general-purpose tool but can often be out-performed by custom code.

The key is to use something like the Apache Commons StringUtils library. This is very easy to use and has a large number of functions that are missing from the standard Java vocabulary.

The function:

i = StringUtils.indexOfAny("A#B^C&D!ased&acdf@Mhj%","[#^&!@%]+");

will get you the index of the first separator character. It is up to you to carve off the front and iterate over the array.

Question 3

String#split() uses a regular expression to find the split position and will remove the matching groups from the result (these are the tokens that you normally don't want). If you want to get the tokens as well, you need a zero-length match by using ~~look-aheads~~ look-behinds.

String s = "A#B^C&D!ased&acdf@Mhj%"
String temp[] = s.split("(?<=[#^&!@%]+)");

The expression is changed to match each position after a token and create a zero-length match. Thus the result will contain the tokens as well.

Question 4

The split method splits around matches of the regexp, so maybe it should be [#|^|&|!|@|%]