Question

I have a .tsv file which has 39 column the last but one column has data as string whose length in more than 100,000 characters Now what is happening is when i am trying to read the file line 1 has headers and then the data follows

What is happening is its after reading line 1 its goes to line 3 then line 5 then line 7 Though all the rows have same data Following the log i am getting

lineNo=3, rowNo=2, customer=503837-100 , last but one cell length=111275
lineNo=5, rowNo=3, customer=503837-100 , last but one cell length=111275
lineNo=7, rowNo=4, customer=503837-100 , last but one cell length=111275
lineNo=9, rowNo=5, customer=503837-100 , last but one cell length=111275
lineNo=11, rowNo=6, customer=503837-100 , last but one cell length=111275
lineNo=13, rowNo=7, customer=503837-100 , last but one cell length=111275
lineNo=15, rowNo=8, customer=503837-100 , last but one cell length=111275
lineNo=17, rowNo=9, customer=503837-100 , last but one cell length=111275
lineNo=19, rowNo=10, customer=503837-100 , last but one cell length=111275

Following is my code:

import java.io.FileReader;
import org.supercsv.cellprocessor.Optional;
import org.supercsv.cellprocessor.constraint.NotNull;
import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.io.CsvBeanReader;
import org.supercsv.io.ICsvBeanReader;
import org.supercsv.prefs.CsvPreference;

public class readWithCsvBeanReader {
    public static void main(String[] args) throws Exception{
        readWithCsvBeanReader();
    }


private static void readWithCsvBeanReader() throws Exception {

    ICsvBeanReader beanReader = null;

    try {

        beanReader = new CsvBeanReader(new FileReader("C:\MAP TSV\abc.tsv"), CsvPreference.TAB_PREFERENCE);
        // the header elements are used to map the values to the bean (names must match)
        final String[] header = beanReader.getHeader(true);
        final CellProcessor[] processors = getProcessors();
        TSVReaderBrandDTO tsvReaderBrandDTO = new TSVReaderBrandDTO();

        int i = 0;
        int last = 0;

        while( (tsvReaderBrandDTO = beanReader.read(TSVReaderBrandDTO.class, header, processors)) != null ) {
            if(null == tsvReaderBrandDTO.getPage_cache()){
                last = 0;
            }
            else{
                last = tsvReaderBrandDTO.getPage_cache().length();
            }
            System.out.println(String.format("lineNo=%s, rowNo=%s, customer=%s , last but one cell length=%s", beanReader.getLineNumber(),
                beanReader.getRowNumber(), tsvReaderBrandDTO.getUnique_ID(), last));
            i++;
        }

        System.out.println("Number of rows : "+i);

    }
    finally {
        if( beanReader != null ) {
            beanReader.close();
        }
    }
}

private static CellProcessor[] getProcessors() {

    final CellProcessor[] processors = new CellProcessor[] { 
         new Optional(), new NotNull(), new NotNull(), new NotNull(), new NotNull(),
         new NotNull(), new NotNull(), new NotNull(), new NotNull(), new NotNull(),
         new NotNull(), new NotNull(), new NotNull(), new NotNull(), new NotNull(),
         new NotNull(), new NotNull(), new NotNull(), new NotNull(), new NotNull(),
         new NotNull(), new NotNull(), new NotNull(), new NotNull(), new NotNull(),
         new NotNull(), new NotNull(), new NotNull(), new NotNull(), new NotNull(),
         new NotNull(), new NotNull(), new NotNull(), new NotNull(), new NotNull(),
         new NotNull(), new NotNull(), new NotNull(), new Optional()};

        return processors;
    }
}

Please let me know where i am going wrong

Was it helpful?

Solution 2

I checked http://supercsv.sourceforge.net/examples_reading.html. Have a close look at Example CSV file and Output. Couldn't it be the case that your lines contain a non-escaped " (double apostrophe) character so the parser thinks that the data record spans over two physical lines?

If you do not use the double-apostrophe character as a quote character, you can change the CsvPreference - see http://supercsv.sourceforge.net/apidocs/org/supercsv/prefs/CsvPreference.html - so that double-quote is not considered as a quote character:

CsvPreference MY_PREFERENCES = new CsvPreference.Builder(
    SOME_NEVER_USED_CHARACTER, ',', "\r\n").build();

Of course for tab-delimited CSV use something like this:

CsvPreference MY_PREFERENCES = new CsvPreference.Builder(
    SOME_NEVER_USED_CHARACTER, '\t', "\r\n").build();

Refer to the CsvPreference javadoc for the signature of the Builder and amend the actual values accordingly.

OTHER TIPS

If you use a CSV parser to parse a TSV input, you're gonna have a bad time. Use a proper TSV parser. uniVocity-parsers comes with a TSV parser/writer. You can use annotated java beans as well to parse your file directly into instances of a class.

Examples:

This code parses a TSV as rows.

TsvParserSettings settings = new TsvParserSettings();

// creates a TSV parser
TsvParser parser = new TsvParser(settings);

// parses all rows in one go.
List<String[]> allRows = parser.parseAll(new FileReader(yourFile));

Use a BeanListProcessor parse into java beans:

BeanListProcessor<TestBean> rowProcessor = new BeanListProcessor<TestBean>(TestBean.class);

TsvParserSettings parserSettings = new TsvParserSettings();
parserSettings.setRowProcessor(rowProcessor);

TsvParser parser = new TsvParser(parserSettings);
parser.parse(new FileReader(yourFile));

// The BeanListProcessor provides a list of objects extracted from the input.
List<TestBean> beans = rowProcessor.getBeans();

This is how the TestBean class looks like: class TestBean {

// if the value parsed in the quantity column is "?" or "-", it will be replaced by null.
@NullString(nulls = { "?", "-" })
// if a value resolves to null, it will be converted to the String "0".
@Parsed(defaultNullRead = "0")
private Integer quantity;


@Trim
@LowerCase
@Parsed(index = 4)
private String comments;

// you can also explicitly give the name of a column in the file.
@Parsed(field = "amount")
private BigDecimal amount;

@Trim
@LowerCase
// values "no", "n" and "null" will be converted to false; values "yes" and "y" will be converted to true
@BooleanString(falseStrings = { "no", "n", "null" }, trueStrings = { "yes", "y" })
@Parsed
private Boolean pending;

Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top