Frage

I have an HTML help system that I need to convert to SharePoint. The two most time consuming projects are to change the document links and to gather metadata. However, I'm lucky because this data is easily accessible. Each file is an HTML document, oversimplified as below:

 <body>
   <!--- Metadata follows
   Procedure Name: my document
   Procedure Number: xxxxx
   Use: freeform text explaining when procdure is used
   Revision Date: xx/xx/xx
   By: responsible party for revision
   <!--- end metadata

   <h1>Procedure Name<\h1>
   <p>procedure background and narrative, with links, as needed, to other documents at \\documentation-server\path\document-name.html
 <\body>

I can successfully extract & manipulate the right Strings, and I'm trying to incorporate that process into an automated solution. Since this is my first venture into file i/o, however, I'm a little fuzzy on what to do next.

In a perfect world, given a path, I would like to step though each *.html file in a path. I cannot seem to find a class/method to do that. newInputStream and newOutpuStream give me the file access, but I need to provide a path & file parameter. The FileVisitor interface appears to only interact file attributes and perform delete/copy/rename type functions.

Is there a soemthing that would combine these into a single function that would step through each file in a path, open it and allow my line-by-line parse, then close the file and move to the next one to repeat?

My other thought was to create an array of filenames, then feed that array into the filename parameter of newInputStream.

Suggestions?

War es hilfreich?

Lösung

If you use Java 7, the FileVisitor interface enables you to walk a file tree very easily. See for example the Java Tutorial.

You can override the visitFile method to do what you want with the file, for example (not tested):

@Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attr) {
    if (attr.isRegularFile() && file.getFileName().toString().endsWith(".html")) {
        Charset charset = Charset.forName("UTF-16");
        try (BufferedReader reader = Files.newBufferedReader(file, charset)) {
           String line;
           while ((line = reader.readLine()) != null) {
               System.out.println(line); //do what you need to do here
            }
         } catch (IOException x) {
             //Print / log the errror
         }
    }
    return CONTINUE;
}

Andere Tipps

You need html parser - http://htmlparser.sourceforge.net/. Then link each document and it will do what you want to do.

It may seem a bit counterintuitive, but the File object in java also represents Directories.

You can check if it is a directory by doing:

file.isDirectory()

If it is, you can list all files and handle them accordingly:

for(File f : file.listFiles()){
   handle(f);
}
java.io.File file = new File("yourPath");
if(file.isDirectory())
    for(File f : file.listFiles(new YourFileFilter()))
       doYourReading(new FileInputStream(f));

And:

class YourFileFilter extends java.io.FileFilter{
    public boolean accept(File pathname) {
          return pathname.getName().toLowerCase().endsWith(".html");
    }
}

That's the basic idea, at least. Exception handling is on you (;

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top