Question

Haskell noob here. I have a question specifically regarding how to use an existing library that may lead to some more fundamental aspects of the proper use of Haskell.

I'm learning Haskell and have a small project in mind to work on while I learn. The script will need to find all the tarballs in a given directory and unpack them in parallel. At this point, I'm working on the basic functionality of unpacking. So, using the Codec.Archive.Tar package, how can I override its behavior regarding tarballs with fully qualified paths?

Here's some example code:

module Main where

import qualified Codec.Archive.Tar as Tar
import qualified Codec.Compression.GZip as GZip
import Control.Monad (liftM, unless)
import qualified Data.ByteString.Lazy as BS
import System.Directory (doesDirectoryExist, getDirectoryContents)
import System.Exit (exitWith, ExitCode(..))
import System.FilePath.Posix (takeExtension)

searchPath = "/home/someuser/tarball/dir"

exit = exitWith ExitSuccess
die = exitWith (ExitFailure 1)

processFile :: String -> IO ()
processFile file = do
    putStrLn $ "Unpacking " ++ file ++ " to " ++ searchPath
    Tar.unpack searchPath . Tar.read . GZip.decompress =<< BS.readFile filePath
    where filePath = searchPath ++ "/" ++ file

main = do
    dirExists <- doesDirectoryExist searchPath
    unless dirExists $ (putStrLn $ "Error: Search path not found: " ++ searchPath) >> die
    files <- targetFiles `liftM` getDirectoryContents searchPath
    mapM_ processFile files
    exit
    where targetFiles = filter (\f -> f /= "." && f /= ".." && takeExtension f == ".tgz")

When I run this in a directory with tarballs that were packed with:

tar czvPf myfile.tgz /tarball_testing/myfile

I get the following output:

Unpacking myfile.tgz to /tarball_testing
unpacker.hs: Absolute file name in tar archive: "/tarball_testing/myfile"

The second line is the issue. Reading the docs for Codec.Archive.Tar I don't see a way to disable this functionality (not interested in discussions of why I want to use full paths in tarballs, or the relative security implications of doing so).

The first thing that comes to mind is that I somehow need to override the function but that doesn't "feel" like the way a pro Haskeller would do it. Can I get a pointer in the right direction?

Was it helpful?

Solution

You cannot monkey patch or otherwise override a function from a Haskell module, and therefore no workaround will let you avoid the safety measures of the library. What you can do, however, is use the functionality in Codec.Archive.Tar to modify the tar entry paths before unpacking so that they won't be absolute any more. Specifically, there is a mapEntriesNoFail function with type

mapEntriesNoFail :: (Entry -> Entry) -> Entries e -> Entries e

Entries is the type of the argument to Tar.unpack, while Entry is the type of an individual entry. Thanks to mapEntriesNoFail, our problem becomes writing an Entry -> Entry function to adjust the paths. For that, first we will need some extra imports:

import qualified Codec.Archive.Tar.Entry as Tar
import System.FilePath.Posix (takeExtension, dropDrive, hasTrailingPathSeparator)
import Data.Either (either)

The function can look like this:

dropDriveFromEntry :: Tar.Entry -> Tar.Entry
dropDriveFromEntry entry =
    either (error "Resulting tar path is somehow too long")
        (\tp -> entry { Tar.entryTarPath = tp })
        drivelessTarPath
    where
    tarPath = Tar.entryTarPath entry
    path = Tar.fromTarPath tarPath
    toTarPath' p = Tar.toTarPath (hasTrailingPathSeparator p) p
    drivelessTarPath = toTarPath' $ dropDrive path

This may seem a little long-winded; however, the hoops we jump through are there to ensure the resulting tar paths are sane. You can read about the gory details of tar handling on the Codec.Archive.Tar.Entry documentation. The key function in this definition is dropDrive, which makes an absolute path relative (in Linux, it strips the leading slash of an absolute path).

It is worth spending a few words on the use of either. toTarPath produces a value of type Either String TarPath to account for the possibility of failure. Specifically, the conversion to a tar path fails if the provided path is too long. In our case, however, the path cannot be too long, as it is a path which already was in a tar file, perhaps with a removed leading slash. That being so, it is good enough to eliminate the Either wrapping with either, passing an error instead of the function to handle the (impossible) Left case.

With dropDriveFromEntry in hand, we just have to map it over the entries before unpacking. The relevant line of your program would become:

    Tar.unpack searchPath . Tar.mapEntriesNoFail dropDriveFromEntry
        . Tar.read . GZip.decompress =<< BS.readFile filePath

Note that if there were relevant errors to be accounted for in dropDriveFromEntry, we would make it return Either String TarPath, and then use mapEntries instead of mapEntriesNoFail.

With these changes, the entry in your tar file will be extracted to /home/someuser/tarball/dir/tarball_testing/myfile. If that is not what you intended, you can modify dropDriveFromEntry so that it performs whatever extra path processing you need.

P.S.: Regarding the alternate title of your question, and considering the sensible little program you have shown us, I do not think you should be worried :)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top