Question

I'm using Buffered Reader to pass individual lines of a file to Java's StringTokenizer. The file is structurd as follows:

"2,0";"foo";"foo.doc";"12345"
"2,4";"foo";"foo.doc";"34567";"foo7";"foo7.doc";"45678";"foo6";"foo6.doc";"56789";"foo5";"foo5.doc";"67890";"foo4";"foo4.doc";"23456"   
"3,0";"foo7";"foo7.doc";"34567"
"3,0";"foo6";"foo6.doc";"45678"
"3,0";"foo5";"foo5.doc";"56789"
"3,0";"foo4";"foo4.doc";"67890"

Here's the code I'm using.

public class parse {
  public static void main(String args[]) {
    FileInputStream inputStream = new FileInputStream("whidata0.txt");
    BufferedReader br = new BufferedReader(new InputStreamReader(inputStream)); 
    while((scrubbedInput=br.readLine())!=null) {
      StringTokenizer strTok = new StringTokenizer(scrubbedInput, ";", false);
      int tokens = strTok.countTokens();
      while (strTok.hasMoreTokens()) {
        tok01 = strTok.nextToken();
      }
      System.out.println("  scrubbed: " + scrubbedInput);
      System.out.println("    tokens: " + tokens);
      System.out.println("     tok01: " + tok01);
    }
  }
}

which yields this result.

scrubbed: "2,0";"foo";"foo.doc";"12345" 
  tokens: 4
   tok01: 12345  scrubbed: "2,4";"foo";"foo.doc";"34567";"foo7";"foo7.doc";"45678";"foo6";"foo6.doc";"56789";"foo5";"foo5.doc";"67890";"foo4";"foo4.doc";"23456"    
  tokens: 16
   tok01: 23456
scrubbed: "3,0";"foo7";"foo7.doc";"34567"
  tokens: 4
   tok01: 34567
scrubbed: "3,0";"foo6";"foo6.doc";"45678"
  tokens: 4
   tok01: 45678
scrubbed: "3,0";"foo5";"foo5.doc";"56789"
  tokens: 4
   tok01: 56789
scrubbed: "3,0";"foo4";"foo4.doc";"67890"               
  tokens: 4
   tok01: 67890

When using nextToken() what is the starting token supposed to be? It appears as though StringTokenizer starts with token 0, so that the nextToken() is actually token 1 -- the second physical token. I did not see a firstToken() method in Java documentation, nor did I see a way to assign specific tokens to specific variables (e.g., String myToken = strTok.tokenNumber(0) etc.). What do I need to do to access the first physical token in my String?

Was it helpful?

Solution

Your code does not reflect the output, but anyhow you might want to use the String.split() functions instead of a tokenizer, when you want to access an arbitrary token, e.g.:

    String st = "a;b;c";        
    String[] tokens = st.split(";");
    System.out.println(tokens[0]);

will print out "a", the first token.

The StringTokenizer class allows only to access token after token, you cannot access a token in a random access way. But you can use it also to access the first token:

    String st = "a;b;c";        
    StringTokenizer tokenizer = new StringTokenizer(st,";");
    System.out.println(tokenizer.nextToken());

Will also print out "a", the first token.

OTHER TIPS

You overwrite the value of tokens in your loop.

Try this and have a look at the output.

public class parse {
  public static void main(String args[]) {
    FileInputStream inputStream = new FileInputStream("whidata0.txt");
    BufferedReader br = new BufferedReader(new InputStreamReader(inputStream)); 
    while((scrubbedInput=br.readLine())!=null) {
      StringTokenizer strTok = new StringTokenizer(scrubbedInput, ";", false);
      int tokens = strTok.countTokens();
      while (strTok.hasMoreTokens()) {
        tok01 = strTok.nextToken();
        System.out.println("     tok01: " + tok01);
      }
      System.out.println("  scrubbed: " + scrubbedInput);
      System.out.println("    tokens: " + tokens);
      System.out.println("last tok01: " + tok01);
    }
  }
}

The problem here is you printing the System.out.println(" tok01: " + tok01); out of while loop

  StringTokenizer strTok = new StringTokenizer(scrubbedInput, ";", false);
  int tokens = strTok.countTokens();
  while (strTok.hasMoreTokens()) {
    tok01 = strTok.nextToken();// here is the problem
  }
  System.out.println("  scrubbed: " + scrubbedInput);
  System.out.println("    tokens: " + tokens);
  System.out.println("     tok01: " + tok01);

I think it should be like below

   StringTokenizer strTok = new StringTokenizer(scrubbedInput, ";", false);
   int tokens = strTok.countTokens();
   System.out.println("  scrubbed: " + scrubbedInput);
   System.out.println("    tokens: " + tokens);
   while (strTok.hasMoreTokens()) {
       tok01 = strTok.nextToken();           
       System.out.println("     tok01: " + tok01);
   }

Your while loop iterates over all tokens first i think it is a misplaced }.

    while (strTok.hasMoreTokens()) {
      tok01 = strTok.nextToken();                  
      System.out.println("     tok01: " + tok01);
    }
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top