How to separate items in a file consisting of spaces when the items themselves are separate with spaces?

StackOverflow https://stackoverflow.com/questions/7389636

سؤال

I have a very long price file from my wholesaler that I have som dificulties to read into my program because each column is seperated with x number of white spaces. Like this;

99995116273       34 mm asasa                                         00472,50100                                                                                               
99998375442       11 lalaaasdsddfgdfgdf                                00503,00206                                                                                             
99998375443       1 1/4 Microkupling                             00867,00206 

How can I use the Scanner class in Java to sperate each column into Part no, Description and Price ?

هل كانت مفيدة؟

المحلول

Use the split method. This method takes a regular expression as a parameter, so somthing like this should work for you:

String line =....;
String[] colums = line.split("\\s{2,}");

This will create a new string each time it finds two or more spaces (the spaces will be discarded). The result will be an array containing the words you need.

The {2,} implies that for the string to be broken, it will need to have two or more spaces.

نصائح أخرى

Good morning, I am not a java developer by trade but instead of thinking about the value delimiter as a space, have you tried thinking about it as a "tab"? I have dealt with tab demlimeted files in the past and this could be the case here.

Assuming that there's 1 item per line you can use the following:

Scanner s = new Scanner(input).useDelimiter("\\n");

So s.next will retrieve a string containing an item and then scan each line individually or simply split it.

Looking at the pasted text it seems the original text is using tab characters to align columns. If the text you are processing has the tabs and the fields (items) themselves do not contain spaces then you can use the one more tab character as the delimiter.

If the tab characters have already been converted to spaces and the result is the above output then this becomes a much more difficult problem and can be solved only heuristically.

Again looking at the text, the fornat seems to be

  • Line begins with a part-number, which is a sequence of digits followed by whitespace (which is not part of the field)
  • Line ends with price, which starts after whitespace (that is not part of the field) and is a sequence of digits followed by one or more (command followed by sequence of digits)
  • Everything in between is description, after trimming whitspaces on both sides

If you can confirm this is the format, then the solution is not very complex to implement.

Instead of splitting the string why not read the part number from the beginning of the string, the price from the end and what is left in the middle is the description.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top