题
我如何抽取一个焦油(或者tar.gz或焦油。bz2)的文件在Java?
解决方案
注意:的此功能是通过后面一个单独的项目,阿帕奇百科全书压缩出版,如在另一个说明答。这答案是过时的。
我还没有使用的焦油的API直接,但焦油和bzip2在蚂蚁实施;你可以借其执行,或者可能使用Ant做你所需要的。
Gzip已部分的Java SE 的(和我猜蚂蚁执行遵循相同的模型)。
GZIPInputStream
仅仅是一个InputStream
装饰器。你可以用,例如,FileInputStream
在BufferedInputStream
并使用它以同样的方式,你会使用任何<=>:
InputStream is = new GZIPInputStream(new FileInputStream(file));
(注意,GZIPInputStream有它自己的,内部缓冲器,所以缠绕在<=>的<=>可能会降低性能。)
其他提示
您可以使用Apache的百科全书压缩库做到这一点。您可以从 http://mvnrepository.com/artifact/下载1.2版本org.apache.commons /公地压缩/ 1.2 。
下面两种方法:一个解压缩文件和另外一个,untars它。因此,对于文件 <文件名> tar.gz的,你需要先解压缩,然后看其解压缩。请注意,tar归档文件可以包含文件夹以及,情况,即他们需要在本地文件系统创建的。
享受。
/** Untar an input file into an output file.
* The output file is created in the output folder, having the same name
* as the input file, minus the '.tar' extension.
*
* @param inputFile the input .tar file
* @param outputDir the output directory file.
* @throws IOException
* @throws FileNotFoundException
*
* @return The {@link List} of {@link File}s with the untared content.
* @throws ArchiveException
*/
private static List<File> unTar(final File inputFile, final File outputDir) throws FileNotFoundException, IOException, ArchiveException {
LOG.info(String.format("Untaring %s to dir %s.", inputFile.getAbsolutePath(), outputDir.getAbsolutePath()));
final List<File> untaredFiles = new LinkedList<File>();
final InputStream is = new FileInputStream(inputFile);
final TarArchiveInputStream debInputStream = (TarArchiveInputStream) new ArchiveStreamFactory().createArchiveInputStream("tar", is);
TarArchiveEntry entry = null;
while ((entry = (TarArchiveEntry)debInputStream.getNextEntry()) != null) {
final File outputFile = new File(outputDir, entry.getName());
if (entry.isDirectory()) {
LOG.info(String.format("Attempting to write output directory %s.", outputFile.getAbsolutePath()));
if (!outputFile.exists()) {
LOG.info(String.format("Attempting to create output directory %s.", outputFile.getAbsolutePath()));
if (!outputFile.mkdirs()) {
throw new IllegalStateException(String.format("Couldn't create directory %s.", outputFile.getAbsolutePath()));
}
}
} else {
LOG.info(String.format("Creating output file %s.", outputFile.getAbsolutePath()));
final OutputStream outputFileStream = new FileOutputStream(outputFile);
IOUtils.copy(debInputStream, outputFileStream);
outputFileStream.close();
}
untaredFiles.add(outputFile);
}
debInputStream.close();
return untaredFiles;
}
/**
* Ungzip an input file into an output file.
* <p>
* The output file is created in the output folder, having the same name
* as the input file, minus the '.gz' extension.
*
* @param inputFile the input .gz file
* @param outputDir the output directory file.
* @throws IOException
* @throws FileNotFoundException
*
* @return The {@File} with the ungzipped content.
*/
private static File unGzip(final File inputFile, final File outputDir) throws FileNotFoundException, IOException {
LOG.info(String.format("Ungzipping %s to dir %s.", inputFile.getAbsolutePath(), outputDir.getAbsolutePath()));
final File outputFile = new File(outputDir, inputFile.getName().substring(0, inputFile.getName().length() - 3));
final GZIPInputStream in = new GZIPInputStream(new FileInputStream(inputFile));
final FileOutputStream out = new FileOutputStream(outputFile);
IOUtils.copy(in, out);
in.close();
out.close();
return outputFile;
}
Apache Commons VF 支持作为一个焦油 虚拟文件系统, 支持网址喜欢这一个 tar:gz:http://anyhost/dir/mytar.tar.gz!/mytar.焦油!/path/in/tar/README.txt
Archiver archiver = ArchiverFactory.createArchiver("tar", "gz");
archiver.extract(archiveFile, destDir);
相关性:
<dependency>
<groupId>org.rauschig</groupId>
<artifactId>jarchivelib</artifactId>
<version>0.5.0</version>
</dependency>
我只是试图一堆建议的库(TrueZip,阿帕奇压缩)的,但没有运气。
下面是用Apache共享VFS的示例:
FileSystemManager fsManager = VFS.getManager();
FileObject archive = fsManager.resolveFile("tgz:file://" + fileName);
// List the children of the archive file
FileObject[] children = archive.getChildren();
System.out.println("Children of " + archive.getName().getURI()+" are ");
for (int i = 0; i < children.length; i++) {
FileObject fo = children[i];
System.out.println(fo.getName().getBaseName());
if (fo.isReadable() && fo.getType() == FileType.FILE
&& fo.getName().getExtension().equals("nxml")) {
FileContent fc = fo.getContent();
InputStream is = fc.getInputStream();
}
}
和行家依赖性:
<dependency>
<groupId>commons-vfs</groupId>
<artifactId>commons-vfs</artifactId>
<version>1.0</version>
</dependency>
在除了为gzip和bzip2,也焦油支撑href="http://commons.apache.org/compress/" rel="nofollow">阿帕奇百科全书压缩API,最初基于 ICE工程爪哇tar包,这既是API和独立的工具。
下面是一个基于丹Borza这个刚才的答复的一个版本,它使用的 Apache的百科全书压缩和爪哇NIO(即路径,而不是档案)。它也确实在一个流中解压缩和执行解压缩,所以没有中间文件的创建。
public static void unTarGz( Path pathInput, Path pathOutput ) throws IOException {
TarArchiveInputStream tararchiveinputstream =
new TarArchiveInputStream(
new GzipCompressorInputStream(
new BufferedInputStream( Files.newInputStream( pathInput ) ) ) );
ArchiveEntry archiveentry = null;
while( (archiveentry = tararchiveinputstream.getNextEntry()) != null ) {
Path pathEntryOutput = pathOutput.resolve( archiveentry.getName() );
if( archiveentry.isDirectory() ) {
if( !Files.exists( pathEntryOutput ) )
Files.createDirectory( pathEntryOutput );
}
else
Files.copy( tararchiveinputstream, pathEntryOutput );
}
tararchiveinputstream.close();
}