PHP request balancing in LAMP server

Question 1

First of all, I advise against using system calls, especially when you have that many requests. Running external processes can cause big performance problems and since in your case the no. of processes / memory usage changes rapidly (you were saying 2000 requests at a time), you cannot use a cronjob to cache those values (even if you run a cron every second, you can't be sure the values are 100% real). You can get the memory usage for your script, approximate the no. of processes you can handle at a time and that should do it.

Now, as far as i understand, you want to process requests in a certain order: process requests 1-200, then 201-400, and so on? If that is the case, you would need to keep track of the requests that were already processed.

A simple way to achieve this would be to keep a request queue in a database - if you can use memcached or something similar, even better:

every time you get a request, you check the queue and make sure you don't have more than 200 active requests;
the next step would be to check that the request should run (this implies you can uniquely identify each request i.e by checking some value in GET/POST) - this allows you to make sure that if request #200 was processed, let's say in the last minute, you will ignore it and allow for request #201 to run;
if the request checks out, you add it to the queue as active and mark it as completed / delete it from queue once it's done;

However, if request order doesn't matter to you, instead of a request queue you could just keep a request count, and make sure you never go above a certain limit.

Question 2

I've made a prototype of a PHP process limiter using APC.

<?php

   $processes = apc_fetch('processes');
   if(!$processes) { // Initial Status
        $processes=1;
   }
   if ($processes > 3) {
        echo "Reject: ". $processes;
        // Return HTTP/403 ...
        exit -1;
   }

   $processes ++;
   apc_store('processes', $processes);

   // Long memory hunger code
   sleep(10);
   // .... your code   .....//

   // Implement global MUTEX??
   $processes = apc_fetch('processes');
   echo "Pending process: ". ($processes -1);
   $processes --;
   apc_store('processes', $processes);
?>

Question 3

Depending on your access to the server, you can do what you want by reading the output of two commands. I'm assuming you are on a linux server, if that is not the case, then another command/option will have to be used.

ps H -U apache (to get all the threads of apache)
cat /proc/meminfo

I would use, for instance, a cron job to write that info to a file that PHP can read and then use that information on your script.

For the number of processes, it is as simple as counting the number of lines on the file.

For the available memory, you will have to do some calculations. The output of meminfo is long and detailed, but you will need to take only two values, memfree and swapfree. If the system is dedicated and no other kind of process is working, you kan also include the cached values since those will be, most probably, already used by apache.

If you can't/don't want to use a cron job on the system, but you can execute commands from PHP, you can execute those, but I think is better to leave each part of the job separated.

Question 4

I feel like it would be better to try and give you the best way of achieving your goal in a scalable manner rather than rejecting requests and relying on system metrics. I have used this same setup in the past for processing videos.

If it were me I would set it up like so:

Start with an elastic load balancer
Inside the load balancer create an auto-scaling group of on-demand small ec2 instances (you could even use micro if you are on a really tight budget). The size of the group will vary depending on the type of work load you have. Make sure to use CloudWatch to scale the group out based on the load balancer's workload.
These instances will be responsible for receiving the processing requests and relaying them to an SQS queue. The instances should not need to work very hard at all as all they re doing is forwarding the requests to the SQS queue.

NB: You could actually bypass steps 1-3 completely if your clients are able to push requests straight to the SQS queue.
Now lets make your work force. Set up another autoscale group with some more small instances, however this group is going to be comprised of spot instances. Set the minimum size of the group to 0 and the maximum to like 10. Also set the spot instance price to something that will allow instances to be spun up most of the time but not to costs you lots of money if the spot price spikes.
Using CloudWatch, monitor the SQS queue and trigger the worker autoscale group to scale out when the SQS queue has some items in it.
Your worker force instances should be set up to start polling the SQS queue and processing the requests as soon as they are spun up. Its up to you to determine how fast they consume this queue.

Extra Reading: