Web App: Any Way Possible to Monitor HTTP File Downloads

Question 1

My final solution for PHP's connection related problems was to create a web-server using Boost.Asio and a little known threadsafe SAPI released by Facebook. The download link is broken, but it can be found on github here.

The main problem that I experienced while tying to make it work using Apache and other webservers was an inconsistency between existing SAPI's (Fast-CGI, PHP-FPM, mod_apache, etc) and the connected related functions in PHP. They simply were not reliable under any situation that I tried, although many others claim to have gotten it to work with there specific configuration (OS version, Webserver version, SAPI version, PHP version, etc).

The main problem (as you've observed) is that PHP is significantly isolated from Apache and other webservers. By using an embedded PHP sapi you are able to have a greater level of cooperation between PHP and actual socket connections as well as other network related functions. This is the only way that I have been able to get PHP to work hand in hand with a webserver, which is very much what your needing.

However, on a second note, there are many serious pure PHP services surfacing now that PHP has mostly fixed it's garbage collection issues. A simple file server could easily be made using non-blocking sockets or PHP streams, and would likely be fast considering that it would be servicing static content using an async pattern.

I wouldn't mind posting some Boost.Asio tidbits or a simple PHP file service if you feel this is the direction that your solution needs to move. However, it is definitely possible. Many thousands of services have ran into this problem already.

Question 2

Taking your requirement constraints into acount, i'd say it is impossible (at least to cover 100% of the browsers) for various reasons (See a "hacky" solution bellow):

You can display the download process by frequently pulling a second page that returns the %-Value your download script may store in the database. However - as you already noticed - PHP does not offer reliable methods to determine whether a user has aborted or not.

To bypass this problem you could do the following:

Create a download.php file, that is able to return files in chunks. Write a javascript that iteratively is pulling all available chunks, until the download is finished (i.e. download.php?fileId=5&chunk=59). The Javascript can then combine all retrieved chunks and finally render the completed file.

However, whit Javascript you can not directly write to the Harddisk, means:You need to download all chunks to present the user a "finished file". If he stops in between, all the data is lost, which violates your constraint of beeing able to resume downloads.

Since resuming file downloads is a task that has to be implemented on the client side (You somehow need to pick up already downloaded data) you can not do anything about this on the server side. And with JavaScript lacking the functionality of writing (or reading) harddisk directly, it is impossible with only php/Javascript. (In fact there ARE Filesystem Functions in Javascript, but in general no browser allows them for remote sites.)

As a hacky solution, you can abuse the browser cache for resuming file downloads:

Note, that there are various cases, when this does not work:

Users may have disabled browser cache.
Browsers may render files "outdated" on their own.
Browsers may simply ignore your cache advice.

However, with this solution, the worst case will be that the caching / resuming does not work.

Implement a download.phpas mentioned above. The following example is using a fixed ChunkSize of "10", which you may want to addapt to your needs (or even a fixed chunk size -> Fix the calculations as required)

<?

header('Cache-Control: max-age=31556926');
$etag = 'a_unique_version_string';
header('ETag: "'.$etag.'"');

$chunkCount = 10;

$file = $_GET["file"]; //ignored in this example
$file = "someImage.jpg";
$chunk = $_GET["chunk"];

$fileSize = filesize($file);
$chunkSize = ceil($fileSize / $chunkCount); //round to whole numbers.

//get required chunk.
$handle = fopen($file, "r");
$start = ($chunk-1) * $chunkSize + ($chunk-1);
$toRead = min($chunkSize+1, $fileSize - $start); //read next chunk or until EOF.
$end = $start + $toRead;

//echo "reading $toRead from $start to $end";
//die();

if (fseek($handle, $start) == 0){
  $c = fread($handle, $toRead); 
  echo $c;
  @fclose($handle);
}else{
  //error seeking: handle it.
}

?>

Now, any client can download chunks, by calling an url (I setup a demo on my server) like this:

downloading http://dog-net.org/dltest/download.php?file=1&chunk=1
downloading http://dog-net.org/dltest/download.php?file=1&chunk=2
downloading http://dog-net.org/dltest/download.php?file=1&chunk=3
downloading http://dog-net.org/dltest/download.php?file=1&chunk=4
downloading http://dog-net.org/dltest/download.php?file=1&chunk=5

Independent chunks are worthless, so the mentioned JavaScript comes into the game. The following snippet can be generated when a download is invoked. It then will iterate over all required chunks and download them "One by One". If the user aborts, the browser will still have single chunks cached. Meaning: Whenever the user will start the download again, already downloaded chunks will finish within a split second, and not yet requested chunks will be downloaded regulary

<html>
  <head>   
    <script language="javascript">
      var urls = new Array();
      urls[0] = "http://dog-net.org/dltest/download.php?file=1&chunk=1";
      urls[1] = "http://dog-net.org/dltest/download.php?file=1&chunk=2";
      urls[2] = "http://dog-net.org/dltest/download.php?file=1&chunk=3";
      urls[3] = "http://dog-net.org/dltest/download.php?file=1&chunk=4";
      urls[4] = "http://dog-net.org/dltest/download.php?file=1&chunk=5";
      urls[5] = "http://dog-net.org/dltest/download.php?file=1&chunk=6";
      urls[6] = "http://dog-net.org/dltest/download.php?file=1&chunk=7";
      urls[7] = "http://dog-net.org/dltest/download.php?file=1&chunk=8";
      urls[8] = "http://dog-net.org/dltest/download.php?file=1&chunk=9";
      urls[9] = "http://dog-net.org/dltest/download.php?file=1&chunk=10";

      var fileContent = new Array();


      function downloadChunk(chunk){
        var url = urls[chunk-1];
        console.log("downloading " + url);
        var xhr = new XMLHttpRequest();
        xhr.open("GET", url, true);
        xhr.responseType = 'blob';
        xhr.onload = function (e) {
          if (xhr.readyState === 4) {
            if (xhr.status === 200) {
              document.getElementById("log").innerHTML += "downloading " + url + "<br />";
              fileContent.push(xhr.response); 
              document.getElementById("percentage").innerHTML = chunk / urls.length * 100;

              if (chunk < urls.length){  
                downloadChunk(chunk+1);
              }else{
                finishFile();
              }
            } else {
              console.error(xhr.statusText);
            }
          }
        };
        xhr.onerror = function (e) {
          console.error(xhr.statusText);
        };
        xhr.send(null);
      }

      function finishFile(){
         contentType = 'image/jpg'; //TODO: has to be set accordingly!
         console.log("Generating file");
         var a = document.createElement('a');
         var blob = new Blob(fileContent, {'type':contentType, 'endings':'native'});

         console.log("File generated. size: " + blob.size);

         //Firefox
         if (navigator.userAgent.toLowerCase().indexOf('firefox') > -1){
            var url = window.URL.createObjectURL(blob);
            window.location = url;   
         }

         //IE 11 or chrome?
         if (!(window.ActiveXObject) && "ActiveXObject"){   
           //Chrome:
           if (window.chrome){
             a.href = window.URL.createObjectURL(blob);
             a.download = "download";  
             a.click();
           }else{
            //ie 11
            window.navigator.msSaveOrOpenBlob(blob, 'download');
          } 
         }
      } 


      function setProgress(chunk){
        document.getElementById("percentage").innerHTML = chunk / urls.length * 100;
      }
    </script>
  </head>
  <body onload="downloadChunk(1);">

  <div id="percentage"></div>
  <div id="log"></div>

  </body>
</html>

Note, that the Handling of Blobs is Pain in the ... I Managed to get it working in IE 11, Chrome 32 and Firefox 27. No way for Safari so far. Also i did NOT check older Versions.

Demo: http://dog-net.org/dltest/ (its a png image, so open with paint/irfranview/whatevs - file extension not set.)

On First Call, all File Chunks will be downloaded independant. On Second Call you will notice, that they finish pretty quick, because ALL the (already completed) calls have been cached by the browser. (I set cache time to "Forever" - In practice you dont want to do this, but pick like 7 days or so!)

Things you would need to do on your own:

Generate the required JavaScript-Download-Code (Second Snippet)
Add finishFile-Implementations for older Browser Versions.
Check Large files. (Only testet it up to 30 MB)
Pass the correct Mime-Type to the Snippet where required.
Adapt to your UI-Styling.
Ensure, Files are having a proper extension set.

This is just a thought that might give you some ideas, how to implement this.

However, I strongly recommed to use a Client-Side Implementation based o Flash/Java/Silverlight, so you have a failsave implementation that is not depending on Browser Versiosn or any other limitation!

Question 3

You can implement the solution using HTML5 WebSockets.

There are client libraries (built using JavaScript) that abstract out the API in an easy to use way.

There are server side libraries (built using PHP) that implement a WebSocket server.

This way, you can have bi-directional communication and you can capture on the server side, all the possible events that you have mentioned.

Due to shortage of time, I am not providing code but hopefully this gives some direction.

Question 4

In reality, there is no way with PHP (which is a server-side, not a client-side, language) to truly detect when a file download has completed. The best you can do is log the download in your database when it begins. If you absolutely, completely and totally need to know when the download has completed, you'll have to do something like embed a Java applet or use Flash. However, usually that is not the correct answer in terms of usability for your user (why require them to have Java or Flash installed just to download something from you?).

From here.

You can still try to learn a little more about ignore_user_abort and connection_aborted. I might fit somehow what you need. But you will not get really efficiently and precisely enough to monitor if the download was really concluded.