Question

I am working on a website with a lot of images and PDF files that are updated regularly, but the old files are not deleted after the new ones are uploaded. Therefor I have a lot of files that are just sitting on the server without being used.

Is there a script or whatever else that I can run and will search for files that nothing is linking to?

EDIT :
I am not asking how to upload new files and delete the old ones in the future. I have already taken care of that.
I just want to know how to get rid of the files that are not in use any more.
Does that make sense?

Was it helpful?

Solution

Try this, just don't forget to change your directory $dir = "/Your/directory/here";

<?
$findex = array();
$findex[path] = array();
$findex[file] = array();

$extensions = array('.cfm','.html','.htm','.css','.php','.gif','.jpg','.png','.jpeg','.dwt');
$excludes = array('.svn');

function rec_scandir($dir)
        {
        $files = array();
        global $findex;
        global $extensions;
        global $excludes;

        if ( $handle = opendir($dir) ) 
        {
        while ( ($file = readdir($handle)) !== false ) 
            {
            if ( $file != ".." && $file != "." ) 
                {
                if ( is_dir($dir . "/" . $file) ) 
                        {
                        $files[$file] = rec_scandir($dir . "/" . $file);
                        }
                else 
                        {
                        for ($i=0;$i<sizeof($extensions);$i++)
                            {
                            if (strpos(strtolower($file),strtolower($extensions[$i])) > 0)
                                {
                                $found = true;
                                }
                            }
                        for ($i=0;$i<sizeof($excludes);$i++)
                            { 
                            if (strpos(strtolower($file),strtolower($excludes[$i])) > 0)
                                {
                                $found = false;
                                }
                            }
                        if ($found)
                            {
                            $files[] = $file;
                            $dirlink = $dir . "/" . $file;
                            array_push($findex[path],$dirlink);
                            array_push($findex[file],$file);
                            }
                        $found = false;
                        }
                    }
                }
            closedir($handle);
            return $findex;
            }
        }


$dir = "/Your/directory/here";

echo "\n";
echo " Searching ". $dir ." for matching files\n";

$files = rec_scandir($dir);

echo " Found " . sizeof($files[file]) . " matching extensions\n";

echo " Scanning for orphaned files....\n";

$findex[found] = array();

for ($i=0;$i<sizeof($findex[path]);$i++)
        {
        echo $i . " ";
        $contents = file_get_contents($findex[path][$i]);
        for ($j=0;$j<sizeof($findex[file]);$j++)
                {
                if (strpos($contents,$findex[file][$j]) > 0)
                        {
                        $findex[found][$j] = 1;
                        }
                }
        }

echo "\n";

$counter=1;
for ($i=0;$i<sizeof($findex[path]);$i++)
        {
        if ($findex[found][$i] != 1)
                {
                echo  " " . $counter . ") " .  substr($findex[path][$i],0,1000) . " is orphaned\n";
                $counter++;
                }
        }

?>

Source: http://sun3.org/archives/297

OTHER TIPS

If there is no probability that you will need those files again after updating the link and you have no files that have multiple links to them, i'd suggest you delete the files at the time of updating a link. Ie:

  1. Link1 points to File1
  2. Update Link1 to point to File2
  3. Delete File1 immediately.

If in your scenario you might have multiple links to same file or files that might be relinked in a short period, i'd suggest setting up a cron job that will execute for a example once every week and will check all files in your files/ directory against the links table in your database and delete them if there are no links referencing that particular file.

There are many free link checker tools that you can use. After running it against your site (filtering for image/pdf files), you can then take that generated list and programmatically check it against your images/pdf directory to find out what's not in the list. Keep in mind that this can be difficult to determine with certainty as dynamically generated src/href's (based on user input/settings, apache rewrites, files returned via code) may not be included.

if it is a unix server, use the find command with something like this:

find /tmp/web_tmp \( \( \( -type f -amin +120 \) -or \( -type f -amin +30 -size 20480k \) \) -exec rm {} \; \) -or \( -depth -type d -empty -exec rmdir {} \; \)

In this case I am looking into the /tmp/web_tmp folder for empty folders as well as files that haven't been accessed in 120 minutes or haven't been accessed in 30 minutes and are over 20Mb. Once founded it will delete the found entry

Maybe in the find command you will find something that will allow you to delete files that haven't been accessed/modified/edited in a long time.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top