Pergunta

So essentially what I want to do here is read in a text file line by line and format them like this: Last name, Title, First name, Middle and then the birth/death date like MM/DD/YYYY

I read in dates like this:

Month, day, year
Mon.  day, year
Mon  day,  year
MMDDYY
M/D/year
M-D-year

and names like this:

Last,   Title   First   Middle  (comma after name needed)

OR

Title   First   Middle   Last

I've been working at this for a really long time and just cannot figure it out. Below is my pretty messy code that's been through a lot of changes in a desperate attempt to figure this out, thank you for your time anybody who wants to help me out (I'm a student) also here's an example of names that are read in:

Roger  Veium  MAY     12,  1908        JUNE 2, 1984
McDermott, James   D.     Jan.    4,  1914      Jul  1, 1970
Amy  Chamberlain   Sep.     28, 1975   09-06-95
Gross,  Adam M. 01-03-77
Joseph Lisota  April    9,  1964
Joseph   W. Eisel Sep   3, 1990

Code:

public String[] readLines(String filename) throws IOException {
    FileReader fileReader = new FileReader(filename);
    BufferedReader bufferedReader = new BufferedReader(fileReader);
    List<String> lines = new ArrayList<String>();
    List<String> names = new ArrayList<String>();
    String line = null;
    String name = "";
    int i;
    int ind;
    int indTemp;
    int indTemp2;
    boolean flag = false;
    String[] monthsLong = {"JANUARY", "FEBRUARY", "MARCH", "APRIL", "MAY", "JUNE", "JULY", "AUGUST", "SEPTEMBER", "OCTOBER", "NOVEMBER", "DECEMBER"};
    String[] monthsLongR = {" 01", "02", " 03", "04", "05", "06", "07", "08", " 09", "10", "11", "12"};
    String[] monthsLow = {"JAN\\.", "FEB\\.", "MAR\\.","APR\\.", "MAY", "JUN\\.", "JUL\\.", "AUG\\.", "SEP\\.", "OCT\\.", "NOV\\.", "DEC\\."};
    String[] monthsCaps = {"   JAN", "FEB", " MAR", "APR", "MAY", "JUN", "JUL", "AUG", " SEP", "OCT", "NOV", "DEC"};

    while ((line = bufferedReader.readLine()) != null) {
        line = line.replaceAll("null", "");
        line = line.replaceAll("-","/");
        line = line.toUpperCase() ;

        for(i = 0; i<12; i++)
        {
            line = line.replaceAll(monthsLong[i], monthsLongR[i]);
        }

        for(i = 0; i<12; i++)
        {
            line = line.replaceAll(monthsLow[i], monthsLongR[i]);
        }

        for(i = 0; i<12; i++)
        {
            line = line.replaceAll(monthsCaps[i], monthsLongR[i]);
        }

        line = line.replaceAll("\\s+", " ");
        if (Character.toString(line.charAt(0)).equals(" "))
            line = line.replaceFirst(" ", "");

 /*     name = line;

        ind = name.indexOf(".");
        indTemp = name.indexOf("0");
        indTemp2 = name.indexOf("1");

        if (ind > -1) {
            System.out.println(" period");
            ind = ind + 1;
            flag = true;
        }
        if(flag == false) {
            if(indTemp2 > indTemp){
                ind = indTemp2 -1;
                System.out.println(" 1");
            }
            if (indTemp > indTemp2){ 
                ind = indTemp - 1;
                System.out.println(" 2");
            }
        }
        flag = false;
    */
        // name = name.substring(0,ind);

        lines.add(line);
    }
    bufferedReader.close();
    return lines.toArray(new String[lines.size()]);
}
Foi útil?

Solução

Ok, so then the only other way is go line by line and create a rule list for each different line format. There are some duplicates but there are many lines that are very different from others. Then you loop through the lines as you are doing and look for rule pointers so that you can apply that rule to the line.

As far as I can see, that is the best way to do this. I have had experience in these files and they can be a nightmare if not handled properly. While working through the rules, you may actually find a pattern that you can use, which is often the case.

I hope this helps.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top