Question

I have a program which reads data from several input files where each file is processed line-by-line. The code for reading each line of the file is the same but the format and requirements of the extracted data varies per file. To be more concrete, I have a configuration file which stores key/value pairs and a file of records - where each line stores a record.

I noticed that the code required for reading the file is virtually identical but the per-line "processing" code is different. So, like any good student of OOP, I want to extract the duplicate code and encapsulate it in an abstract class. Here is the interface for the abstract class:

abstract class FileProcessor {
  public processFile() {};
  protected abstract processLine(aLine); // implemented by the subclass
  public abstract getData();
}

So, processFile contains the "boilerplate" file processing code. The processFile method calls the abstract processLine method, which is implemented in the subclass and allows for the subclass specific file processing operations. Right now I have the following to subclasses.

public ConfigFileProcessor extends FileProcessor {
  processLine(String line) {
    // do some file-specific processing
  }
}

public RecordFileProcessor extends FileProcessor {
  processLine(String line) {
    // do some file-specific processing
  }
}

The problem is the getData method. As I said before, the data in each file differs - in this case one contains key/value pairs and the other contains records. How do I define the return type of getData in a general (non-implmentation specific) way? I was thinking of returning a Collection and using a wildcard to specify the type, but I am not experienced enough with generics to understand the implications/tradeoffs of this choice.

Can the design be improved? Is it even worth going through the trouble of encapsulation in this case? I guess I could just create two different processFile methods that return different data types, but I'd like to put my OOP education to practice if possible.

Thank you in advance for your time and effort.

Was it helpful?

Solution

Generics is the correct way to do this. Have the abstract class use a generic and each subclass defines the generic.

abstract class FileProcessor<T> {
  public void processFile() {};
  protected abstract T processLine(aLine); // implemented by the subclass
  public abstract T getData();
}

public ConfigFileProcessor extends FileProcessor<Configuration>{ ... }
public RecordFileProcessor extends FileProcessor<Record>{ .. }

An aside: your abstract class doesn't compile. No return types are specified.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top