Question

I want to split a string with its tokens in java . for e.g;

String s = "A#B^C&D!ased&acdf@Mhj%"
String temp[] = s.split("[#^&!@%]+");

Current output :-
temp[0] = A
temp[1] = B
temp[2] = C
temp[3] = D
temp[4] = ased

output which i want :-
temp[0] = A#
temp[1] = B^
temp[2] = C&
temp[3] = D!
temp[4] = ased&

My current approach of doing is 
  pos =  find the index of the token in string
  pos = add the size of the token in pos
  charAtPos  = getcharfrom string at index pos
  token = token + charAtPos  

If u have any better way to do it suggest . I think approach not very efficient on the very large Strings .

Était-ce utile?

La solution

Try using positive look-behind, a regex construct that does not capture its input:

String s = "A#B^C&D!ased&acdf@Mhj%";
String temp[] = s.split("(?<=[#^&!@%]+)");

The (?<=expr) construct matches at the point right after expr, without capturing the expr itself, letting you split the text at positions that follow the separator.

Here is a demo on ideone.

Autres conseils

If you have to deal with very large Strings, you will be better off to roll your own code. The Java pattern matching engine is a good, general-purpose tool but can often be out-performed by custom code.

The key is to use something like the Apache Commons StringUtils library. This is very easy to use and has a large number of functions that are missing from the standard Java vocabulary.

The function:

i = StringUtils.indexOfAny("A#B^C&D!ased&acdf@Mhj%","[#^&!@%]+");

will get you the index of the first separator character. It is up to you to carve off the front and iterate over the array.

String#split() uses a regular expression to find the split position and will remove the matching groups from the result (these are the tokens that you normally don't want). If you want to get the tokens as well, you need a zero-length match by using look-aheads look-behinds.

String s = "A#B^C&D!ased&acdf@Mhj%"
String temp[] = s.split("(?<=[#^&!@%]+)");

The expression is changed to match each position after a token and create a zero-length match. Thus the result will contain the tokens as well.

The split method splits around matches of the regexp, so maybe it should be [#|^|&|!|@|%]

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top