Question

I have a serious problem with extracting terms from each string line. To be more specific, I have one csv formatted file which is actually not csv format (it saves all terms into line[0] only)

So, here's just example string line among thousands of string lines;


test.csv

line1 : "31451    CID005319044   15939353   C8H14O3S2    beta-lipoic acid   C1CS@S[C@@H]1CCCCC(=O)O "

line2 : "12232 COD05374044 23439353  C924O3S2    saponin   CCCC(=O)O "

line3 : "9048   CTD042032 23241  C3HO4O3S2 Berberine  [C@@H]1CCCCC(=O)O "


I want to extract "beta-lipoic acid" ,"saponin" and "Berberine" only which is located in 5th position. You can see there are big spaces between terms, so that's why I said 5th position.

In this case, how can I extract terms located in 5th position for each line?

one more thing ;

the length of whitespace between each six terms is not always equal. the length could be one,two,three or four..five... something like that..

Was it helpful?

Solution

Another try:

import java.io.File;
import java.util.Scanner;

public class HelloWorld {
    // The amount of columns per row, where each column is seperated by an arbitrary number
    //  of spaces or tabs
    final static int COLS = 7;

    public static void main(String[] args) {
        System.out.println("Tokens:");
        try (Scanner scanner = new Scanner(new File("input.txt")).useDelimiter("\\s+")) {
            // Counten the current column-id
            int n = 0;
            String tmp = "";
            StringBuilder item = new StringBuilder();
            // Operating of a stream
            while (scanner.hasNext()) {
                tmp = scanner.next();
                n += 1;
                // If we have reached the fifth column, take its content and append the
                // sixth column too, as the name we want consists of space-separated
                // expressions. Feel free to customize of your name-layout varies.
                if (n % COLS == 5) {
                    item.setLength(0);
                    item.append(tmp);
                    item.append(" ");
                    item.append(scanner.next());
                    n += 1;

                    System.out.println(item.toString()); // Doing  some stuff with that
                                                         //expression we got
                }
            }
        }
        catch(java.io.IOException e){
            System.out.println(e.getMessage());
        }
    }
}

OTHER TIPS

if your line[]'s type is String

String s = line[0];
String[] split = s.split("   ");
return split[4]; //which is the fifth item

For the delimiter, if you want to go more precisely, you can use regular expression.

How is the column separated? For example, if the columns are separated by tab character, I believe you can use the split method. Try using the below:

String[] parts = str.split("\\t");

Your expected result will be in parts[4].

Just use String.split() using a regex for at least 2 whitespace characters:

String foo = "31451    CID005319044   15939353   C8H14O3S2    beta-lipoic acid   C1CS@S[C@@H]1CCCCC(=O)O";
String[] bar = foo.split("\\s\\s");
bar[4]; // beta-lipoic acid
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top