ガストリンによりラphpスクリプト?

https://stackoverflow.com/questions/2212635

19-09-2019
|

質問

しているPHPスクリプトと時間がかかり(5-30分）を完了します。だける場合であっての事項にスクリプトを利用カールで擦り落データから別のサーバーです。そこで、それも長で待各ページへの負荷前処理で移動する。

こういうことができるよう開発を行うスクリプトを及ぼす場合がありますのでできますが、設定フラグのデータベース。

私が知る必要があるかを終了することができるhttpリクエストの前のスクリプトは終了します。また、phpスクリプトに最善の方法。

解決

確かにPHPで行うことができますが、これをバックグラウンドタスクとして行うべきではありません。新しいプロセスを開始するプロセスグループから分離する必要があります。

人々はこのFAQに対して同じ間違った答えを与え続けているので、私はここでより完全な答えを書きました：

http://symcbean.blogspot.com/2010/02/php-and-long-running-processes.html

コメントから：

短いバージョンはです shell_exec('echo /usr/bin/php -q longThing.php | at now'); しかし、ここに含めるのに少し長い理由があります。

他のヒント

迅速で汚い方法は ignore_user_abort PHPの関数。これは基本的に次のように書かれています：ユーザーが何をするか気にしないで、このスクリプトが完了するまで実行します。これは、公開されているサイトである場合にやや危険です（可能であるため、20回開始された場合にスクリプトの20 ++バージョンが同時に実行されることになります）。

「クリーン」の方法（少なくともimho）は、プロセスを開始し、1時間（またはその他）にcronjobを実行して、そのフラグが設定されているかどうかを確認するときに（たとえばDBで）フラグを設定することです。設定されている場合、長期にわたるスクリプトが開始され、設定されていない場合、Nothinは発生します。

使用できます exec またシステムバックグラウンドジョブを開始し、その中で作業を行います。

また、あなたが使用しているウェブとウェブをこすり付けるより良いアプローチがあります。スレッドされたアプローチ（一度に1ページで1ページを実行する複数のスレッド）、またはイベントループ（一度に複数のページを実行する1つのスレッド）を使用することもできます。 Perlを使用した私の個人的なアプローチは使用しています AnyEvent :: http.

ETA： Symcbean バックグラウンドプロセスを適切に取り外す方法を説明しましたここ.

いいえ、PHPは最良の解決策ではありません。

RubyやPerlについてはわかりませんが、Pythonを使用すると、ページスクレーパーをマルチスレッドに書き換えることができ、おそらく少なくとも20倍速く実行されるでしょう。マルチスレッドアプリを書くことはやや挑戦することができますが、私が書いた最初のPythonアプリはMutlti-Threaded Page Scraperでした。シェル実行機能のいずれかを使用して、PHPページ内からPythonスクリプトを呼び出すだけです。

はい、PHPで行うことができます。しかし、PHPに加えて、キューマネージャーを使用するのが賢明でしょう。これが戦略です：

大きなタスクを小さなタスクに分割します。あなたの場合、各タスクは単一のページを読み込んでいる可能性があります。
各小さなタスクをキューに送信します。
どこかでキューワーカーを実行します。

この戦略を使用するには、次の利点があります。

長時間のタスクでは、実行の途中で致命的な問題が発生した場合に回復する能力があります。最初から開始する必要はありません。
タスクを順次実行する必要がない場合は、複数のワーカーを実行してタスクを同時に実行できます。

さまざまなオプションがあります（これはほんの数個です）：

rabbitmq（https://www.rabbitmq.com/tutorials/tutorial-one-php.html)
ZEROMQ（http://zeromq.org/bindings:php)
Laravelフレームワークを使用している場合、キューは組み込まれています（https://laravel.com/docs/5.4/queues）、AWS SES、Redis、BeanStalkdのドライバー

PHPは最良のツールである場合とそうでない場合がありますが、使用方法を知っており、残りのアプリケーションはそれを使用して書かれています。これらの2つの品質は、PHPが「十分に良い」という事実を組み合わせて、Perl、Ruby、またはPythonの代わりに、それを使用するためにかなり強力なケースを作ります。

目標が別の言語を学習することである場合は、1つを選択して使用します。あなたが言及した言語は仕事をします、問題ありません。私はたまたまPerlが好きですが、あなたが好きなものは違うかもしれません。

Symcbeanは、彼のリンクでバックグラウンドプロセスを管理する方法についていくつかの良いアドバイスを持っています。

要するに、CLI PHPスクリプトを書き、長いビットを処理します。何らかの方法でステータスを報告することを確認してください。 AJAXまたは従来の方法を使用して、ステータスの更新を処理するPHPページを作成します。キックオフスクリプトは、独自のセッションで実行されているプロセスの開始と、プロセスが進行していることを確認します。

幸運を。

いに答えるということにな実行する背景です。でも重要する状況についての報告では、ユーザーであることを知り作業を行っています。

を受ける時には、PHPの要求をキックオフするために、お店のデータベースの表現に、独自の識別子です。その後、開始画面のきさげの過程で、広い範囲にわたって独自の識別子です。報告書のiPhoneアプリのタスクが開始されたことになるのでチェック指定されたURLの新しいタスクIDを取得し、これを最新の状態にします。のiPhoneのアプリケーションできる世論調査（あるいは"長世論調査"）。一方、バックグラウンド処理がデータベース表現をしていたところ、うまくいったので修了率で現在のステップなど、その他の状況の指標お仕様になっています。としている場合に、この設定を完了フラグ。

XHR（AJAX）リクエストとして送信できます。クライアントは通常、通常のHTTP要求とは異なり、XHRのタイムアウトはありません。

I realize this is a quite old question but would like to give it a shot. This script tries to address both the initial kick off call to finish quickly and chop down the heavy load into smaller chunks. I haven't tested this solution.

<?php
/**
 * crawler.php located at http://mysite.com/crawler.php
 */

// Make sure this script will keep on runing after we close the connection with
// it.
ignore_user_abort(TRUE);


function get_remote_sources_to_crawl() {
  // Do a database or a log file query here.

  $query_result = array (
    1 => 'http://exemple.com',
    2 => 'http://exemple1.com',
    3 => 'http://exemple2.com',
    4 => 'http://exemple3.com',
    // ... and so on.
  );

  // Returns the first one on the list.
  foreach ($query_result as $id => $url) {
    return $url;
  }
  return FALSE;
}

function update_remote_sources_to_crawl($id) {
  // Update my database or log file list so the $id record wont show up
  // on my next call to get_remote_sources_to_crawl()
}

$crawling_source = get_remote_sources_to_crawl();

if ($crawling_source) {


  // Run your scraping code on $crawling_source here.


  if ($your_scraping_has_finished) {
    // Update you database or log file.
    update_remote_sources_to_crawl($id);

    $ctx = stream_context_create(array(
      'http' => array(
        // I am not quite sure but I reckon the timeout set here actually
        // starts rolling after the connection to the remote server is made
        // limiting only how long the downloading of the remote content should take.
        // So as we are only interested to trigger this script again, 5 seconds 
        // should be plenty of time.
        'timeout' => 5,
      )
    ));

    // Open a new connection to this script and close it after 5 seconds in.
    file_get_contents('http://' . $_SERVER['HTTP_HOST'] . '/crawler.php', FALSE, $ctx);

    print 'The cronjob kick off has been initiated.';
  }
}
else {
  print 'Yay! The whole thing is done.';
}

I would like to propose a solution that is a little different from symcbean's, mainly because I have additional requirement that the long running process need to be run as another user, and not as apache / www-data user.

First solution using cron to poll a background task table:

PHP web page inserts into a background task table, state 'SUBMITTED'
cron runs once each 3 minutes, using another user, running PHP CLI script that checks the background task table for 'SUBMITTED' rows
PHP CLI will update the state column in the row into 'PROCESSING' and begin processing, after completion it will be updated to 'COMPLETED'

Second solution using Linux inotify facility:

PHP web page updates a control file with the parameters set by user, and also giving a task id
shell script (as a non-www user) running inotifywait will wait for the control file to be written
after control file is written, a close_write event will be raised an the shell script will continue
shell script executes PHP CLI to do the long running process
PHP CLI writes the output to a log file identified by task id, or alternatively updates progress in a status table
PHP web page could poll the log file (based on task id) to show progress of the long running process, or it could also query status table

Some additional info could be found in my post : http://inventorsparadox.blogspot.co.id/2016/01/long-running-process-in-linux-using-php.html

I have done similar things with Perl, double fork() and detaching from parent process. All http fetching work should be done in forked process.

Use a proxy to delegate the request.

what I ALWAYS use is one of these variants (because different flavors of Linux have different rules about handling output/some programs output differently):

Variant I @exec('./myscript.php \1>/dev/null \2>/dev/null &');

Variant II @exec('php -f myscript.php \1>/dev/null \2>/dev/null &');

Variant III @exec('nohup myscript.php \1>/dev/null \2>/dev/null &');

You might havet install "nohup". But for example, when I was automating FFMPEG video converstions, the output interface somehow wasn't 100% handled by redirecting output streams 1 & 2, so I used nohup AND redirected the output.

if you have long script then divide page work with the help of input parameter for each task.(then each page act like thread) i.e if page has 1 lac product_keywords long process loop then instead of loop make logic for one keyword and pass this keyword from magic or cornjobpage.php(in following example)

and for background worker i think you should try this technique it will help to call as many as pages you like all pages will run at once independently without waiting for each page response as asynchronous.

cornjobpage.php //mainpage

    <?php

post_async("http://localhost/projectname/testpage.php", "Keywordname=testValue");
//post_async("http://localhost/projectname/testpage.php", "Keywordname=testValue2");
//post_async("http://localhost/projectname/otherpage.php", "Keywordname=anyValue");
//call as many as pages you like all pages will run at once independently without waiting for each page response as asynchronous.
            ?>
            <?php

            /*
             * Executes a PHP page asynchronously so the current page does not have to wait for it to     finish running.
             *  
             */
            function post_async($url,$params)
            {

                $post_string = $params;

                $parts=parse_url($url);

                $fp = fsockopen($parts['host'],
                    isset($parts['port'])?$parts['port']:80,
                    $errno, $errstr, 30);

                $out = "GET ".$parts['path']."?$post_string"." HTTP/1.1\r\n";//you can use POST instead of GET if you like
                $out.= "Host: ".$parts['host']."\r\n";
                $out.= "Content-Type: application/x-www-form-urlencoded\r\n";
                $out.= "Content-Length: ".strlen($post_string)."\r\n";
                $out.= "Connection: Close\r\n\r\n";
                fwrite($fp, $out);
                fclose($fp);
            }
            ?>

testpage.php

    <?
    echo $_REQUEST["Keywordname"];//case1 Output > testValue
    ?>

PS:if you want to send url parameters as loop then follow this answer :https://stackoverflow.com/a/41225209/6295712

Not the best approach, as many stated here, but this might help:

ignore_user_abort(1); // run script in background even if user closes browser
set_time_limit(1800); // run it for 30 minutes

// Long running script here

If the desired output of your script is some processing, not a webpage, then I believe the desired solution is to run your script from shell, simply as

php my_script.php

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow