Amazon CloudSearch throws HTTP 403 on document upload

Question 1

this never worked for me. and i used the Cloudsearch terminal to upload files. and php curl to search files.

Question 2

If you are getting the following error

403 Forbidden, Request forbidden by administrative rules.

and if you are sure you have appropriate rules in effect, I would check the api url you are using. Make sure you are using the correct endpoint. If you are doing batch upload the api endpoint should look like below

your-search-doc-endpoint/2013-01-01/documents/batch

Notice 2013-01-01, that is a required part of the url. That is the api version you will be using. You cannot do the following even though it might make sense

your-search-doc-endpoint/documents/batch <- Won't work

To search you would need to hit the following api

your-search-endpoint/2013-01-01/search?your-search-params

Question 3

After many searches and trial and error I was able to put together a small code block, from small pieces of code from everywhere to be able to upload a "file" using CURL and PHP to aws cloudsearch.

The one and most important things is to make sure that your data is prepare correctly to be sent in JSON format.

Note: For cloudsearch you're not uploading a file your posting a stream of JSON data. That is why many of us have a problem uploading the data.

So in my case I wanted to be able to upload data that my search engine on clousearch, it seems simple and it is but the lack of example code to do this is not there most people tell you you to go to the documentation which usually has examples but to use the aws CLI. The php SDK is just a learning curb plus instead of making it simple you do 20 steps to do 1 task and not only that you're require to have all these other libraries that are just wrappers for native PHP functions and sometimes instead of making it simple it becomes complicated.

So back to how I did it, first I am pulling the data from my database as an array and serialize it to save it to a file.

$row = $database_data;

foreach ($rows as $key => $row) {
  $data['type'] = 'add';
  $data['id'] = $row->id;           
  $data['fields']['title'] = $row->title;
  $data['fields']['content'] = $row->content;
  $data2[] = $data;
}

// now save your data to a file and make sure
// to serialize() it
$fp = fopen($path_to_file, $mode)
flock($fp, LOCK_EX);
fwrite($fp, serialize($data2));
flock($fp, LOCK_UN);
fclose($fp);

Now that you have your data saved we can play with it

$aws_doc_endpoint = '{Your AWS CloudSearch Document Endpoint URL}';

// Lets read the data   
$data = file_get_contents($path_to_file);
// Now lets unserialize() it and encoded in JSON format
$data = json_encode(unserialize($data));

// finally lets use CURL    
$ch   = curl_init($aws_doc_endpoint);

curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Length: ' . strlen($data)));
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: application/json'));
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);

$response = curl_exec($ch);
curl_close($ch);

$response = json_decode($response);

if ($response->status == 'success')
{
    return TRUE;
}
return FALSE;

And like I said there is nothing to it. Most answers that I encounter where, use Guzzle its really easy, well yes it is but for just a simple task like this you don't need it.

Aside from that if you still get an error make sure to check the following.

Well formatted JSON data. Make sure you have access to the endpoint.

Well I hope someone finds this code helpful.

Question 4

To diagnose whether it's an access policy issue, have you tried a policy that allows all access to the upload? Something like the following opens it up to everything:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "AWS": "*"
      },
      "Action": "cloudsearch:*"
    }
  ]
}

I noticed that if you just go to the document upload endpoint in a browser (mine looks like "doc-YOURDOMAIN-RANDOMID.REGION.cloudsearch.amazonaws.com") you'll get the 403 "Request forbidden by administrative rules" error, even with open access, so as @dminer said you'll need to make sure you're posting to the correct full url.

Have you considered using a PHP SDK? Like http://docs.aws.amazon.com/aws-sdk-php/guide/latest/service-cloudsearchdomain.html. It should take care of making correct requests, in which case you could rule out transport errors.

Question 5

Try adding "cloudsearch:document" to CloudSearch's access policy under Actions

Question 6

I encountered the same issue with 403 Forbidden, Request forbidden by administrative rules. This totally due to using the Document End point with adding API version as below.

This part is very important to add when inserting data into Cloud Search:

Since you are trying to insert data into Cloud Search you need to use Document End point of Cloud Search given to you by AWS.

It needs to be followed by the API version as below:

https://doc-xxxxx.zzzz.cloudsearch.amazonaws.com/2013-01-01/documents/batch

Same time be mindful to add content-type as well or you could end up getting HTTP 415 Unsupported Media Type Error.

addRequestHeader("Content-Type", "application/xml");