Question

Is there a way to safely sanitize path input, without using realpath()?

Aim is to prevent malicious inputs like ../../../../../path/to/file

 $handle = fopen($path . '/' . $filename, 'r');
Was it helpful?

Solution

Not sure why you wouldn't want to use realpath but path name sanitisation is a very simple concept, along the following lines:

  • If the path is relative (does not start with /), prefix it with the current working directory and /, making it an absolute path.
  • Replace all sequences of more than one / with a single one (a).
  • Replace all occurrences of /./ with /.
  • Remove /. if at the end.
  • Replace /anything/../ with /.
  • Remove /anything/.. if at the end.

The text anything in this case means the longest sequence of characters that aren't /.

Note that those rules should be applied continuously until such time as none of them result in a change. In other words, do all six (one pass). If the string changed, then go back and do all six again (another pass). Keep doing that until the string is the same as before the pass just executed.

Once those steps are done, you have a canonical path name that can be checked for a valid pattern. Most likely that will be anything that doesn't start with ../ (in other words, it doesn't try to move above the starting point. There may be other rules you want to apply but that's outside the scope of this question.


(a) If you're working on a system that treats // at the start of a path as special, make sure you replace multiple / characters at the start with two of them. This is the only place where POSIX allows (but does not mandate) special handling for multiples, in all other cases, multiple / characters are equivalent to a single one.

OTHER TIPS

There is a Remove Dot Sequence algorithm described in RFC 3986 that is used to interpret and remove the special . and .. complete path segments from a referenced path during the process of relative URI reference resolution.

You could use this algorithms for file system paths as well:

// as per RFC 3986
// @see https://www.rfc-editor.org/rfc/rfc3986#section-5.2.4
function remove_dot_segments($input) {
    // 1.  The input buffer is initialized with the now-appended path
    //     components and the output buffer is initialized to the empty
    //     string.
    $output = '';

    // 2.  While the input buffer is not empty, loop as follows:
    while ($input !== '') {
        // A.  If the input buffer begins with a prefix of "`../`" or "`./`",
        //     then remove that prefix from the input buffer; otherwise,
        if (
            ($prefix = substr($input, 0, 3)) == '../' ||
            ($prefix = substr($input, 0, 2)) == './'
           ) {
            $input = substr($input, strlen($prefix));
        } else

        // B.  if the input buffer begins with a prefix of "`/./`" or "`/.`",
        //     where "`.`" is a complete path segment, then replace that
        //     prefix with "`/`" in the input buffer; otherwise,
        if (
            ($prefix = substr($input, 0, 3)) == '/./' ||
            ($prefix = $input) == '/.'
           ) {
            $input = '/' . substr($input, strlen($prefix));
        } else

        // C.  if the input buffer begins with a prefix of "/../" or "/..",
        //     where "`..`" is a complete path segment, then replace that
        //     prefix with "`/`" in the input buffer and remove the last
        //     segment and its preceding "/" (if any) from the output
        //     buffer; otherwise,
        if (
            ($prefix = substr($input, 0, 4)) == '/../' ||
            ($prefix = $input) == '/..'
           ) {
            $input = '/' . substr($input, strlen($prefix));
            $output = substr($output, 0, strrpos($output, '/'));
        } else

        // D.  if the input buffer consists only of "." or "..", then remove
        //     that from the input buffer; otherwise,
        if ($input == '.' || $input == '..') {
            $input = '';
        } else

        // E.  move the first path segment in the input buffer to the end of
        //     the output buffer, including the initial "/" character (if
        //     any) and any subsequent characters up to, but not including,
        //     the next "/" character or the end of the input buffer.
        {
            $pos = strpos($input, '/');
            if ($pos === 0) $pos = strpos($input, '/', $pos+1);
            if ($pos === false) $pos = strlen($input);
            $output .= substr($input, 0, $pos);
            $input = (string) substr($input, $pos);
        }
    }

    // 3.  Finally, the output buffer is returned as the result of remove_dot_segments.
    return $output;
}

The following function canonicalizes file system paths and path components of URIs. It is faster than Gumbo's RFC implementation.

function canonicalizePath($path)
{
    $path = explode('/', $path);
    $stack = array();
    foreach ($path as $seg) {
        if ($seg == '..') {
            // Ignore this segment, remove last segment from stack
            array_pop($stack);
            continue;
        }

        if ($seg == '.') {
            // Ignore this segment
            continue;
        }

        $stack[] = $seg;
    }

    return implode('/', $stack);
}

Notes

  • It does not strip sequences of multiple / as this would not comply with RFC 3986.
  • Obviously, this doesn't work with ..\backslash\paths.
  • I am not sure this function is 100% safe, yet I haven't been able to come up with an input that compromises its output.

Since you only asked for sanitizing, maybe what you need is just a "fail on tricky paths" thing. If normally there wouldn't be any ../../stuff/../like/this in your path input, you only need to check this:

function isTricky($p) {
    if(strpos("/$p/","/../")===false) return false;
    return true;
}

or just

function isTricky($p) {return strpos("-/$p/","/../");}

This quick and dirty way you can block any backward moves and in most cases this is sufficient. (The second version returns a nonzero instead of true but hey, why not!... The dash is a hack for index 0 of the string.)

Side note: also remember slashes vs backslashes - I'd recommend to convert backs to simple slashes first. But that's platform dependent.

As the above functions did not work for me the one or the other way (or have been quite lengthy), I tried my own code:

function clean_path( $A_path="", $A_echo=false )
{
    // IF YOU WANT TO LEAN CODE, KILL ALL "if" LINES and $A_echo in ARGS
    $_p                            = func_get_args();
    // HOW IT WORKS:
    // REMOVING EMPTY ELEMENTS AT THE END ALLOWS FOR "BUFFERS" AND HANDELLING START & END SPEC. SEQUENCES
    // BLANK ELEMENTS AT START & END MAKE SURE WE COVER SPECIALS AT BEGIN & END
    // REPLACING ":" AGAINST "://" MAKES AN EMPTY ELEMENT TO ALLOW FOR CORRECT x:/../<path> USE (which, in principle is faulty)

    // 1.) "normalize" TO "slashed" AND MAKE SOME SPECIALS, ALSO DUMMY ELEMENTS AT BEGIN & END 
        $_s                        = array( "\\", ":", ":./", ":../");
        $_r                        = array( "/", "://", ":/", ":/" );
        $_p['sr']                = "/" . str_replace( $_s, $_r, $_p[0] ) . "/";
        $_p['arr']                = explode('/', $_p['sr'] );
                                                                                if ( $A_echo ) $_p['arr1']    = $_p['arr'];
    // 2.) GET KEYS OF ".." ELEMENTS, REMOVE THEM AND THE ONE BEFORE (!) AS THAT MEANS "UP" AND THAT DISABLES STEP BEFORE
        $_p['pp']                = array_keys( $_p['arr'], '..' );
        foreach($_p['pp'] as $_pos )
        {
            $_p['arr'][ $_pos-1 ] = $_p['arr'][ $_pos ] ="";
        }
                                                                                if ( $A_echo ) $_p['arr2']    = $_p['arr'];
    // 3.) REMOVE ALL "/./" PARTS AS THEY ARE SIMPLY OVERFLUENT
        $_p['p']                = array_keys( $_p['arr'], '.' );
        foreach($_p['p'] as $_pos )
        {
            unset( $_p['arr'][ $_pos ] );
        }
                                                                                if ( $A_echo ) $_p['arr3']    = $_p['arr'];
    // 4.) CLEAN OUT EMPTY ONES INCLUDING OUR DUMMIES
        $_p['arr']                = array_filter( $_p['arr'] );
    // 5) MAKE FINAL STRING
        $_p['clean']            = implode( DIRECTORY_SEPARATOR, $_p['arr'] );
                                                                                if ($A_echo){ echo "arr=="; print_R( $_p  ); };
    return $_p['clean'];    
}

Le simple form:

$filename = str_replace('..', '', $filename);

if (file_exists($path . '/' . $filename)) {
    $handle = fopen($path . '/' . $filename, 'r');
}

Le complex form (from here):

function canonicalize($address)
{
    $address = explode('/', $address);
    $keys = array_keys($address, '..');

    foreach($keys AS $keypos => $key)
    {
        array_splice($address, $key - ($keypos * 2 + 1), 2);
    }

    $address = implode('/', $address);
    $address = str_replace('./', '', $address);
    return $address;
}
echo canonicalize('/dir1/../dir2/'); // returning /dir2/

I prefer a implode / explode solution:

public function sanitize(string $path = null, string $separator = DIRECTORY_SEPARATOR) : string
{
    $pathArray = explode($separator, $path);
    foreach ($pathArray as $key => $value)
    {
        if ($value === '.' || $value === '..')
        {
            $pathArray[$key] = null;
        }
    }
    return implode($separator, array_map('trim', array_filter($pathArray)));
}

A previous version looked that way:

public function sanitize(string $path = null, string $separator = DIRECTORY_SEPARATOR) : string
{
    $output = str_replace(
    [
        ' ',
        '..',
    ], null, $path);
    $output = preg_replace('~' . $separator . '+~', $separator, $output);
    $output = ltrim($output, '.');
    $output = trim($output, $separator);
    return $output;
}

Both have been succesfully tested against this data provider. Enjoy!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top