The SE bots DO get confused when they see this:
HTTP/1.1 200 OK
<h1>The page your requested does not exist</h1>
Or this:
HTTP/1.1 302 Object moved
Location: /fancy-404-error-page.html
It is explained here:
Returning a code other than 404 or 410 for a non-existent page (or
redirecting users to another page, such as the homepage, instead of
returning a 404) can be problematic. Firstly, it tells search engines
that there’s a real page at that URL. As a result, that URL may be
crawled and its content indexed. Because of the time Googlebot spends
on non-existent pages, your unique URLs may not be discovered as
quickly or visited as frequently and your site’s crawl coverage may be
impacted (also, you probably don’t want your site to rank well for the
search query File not found).
Your idea about programmatically sending the 404 header is correct and it instructs the search engine that the URL they requested does not exist and they should not attempt to crawl and index it. Ways to set response status:
header($_SERVER["SERVER_PROTOCOL"] . " 404 Not Found");
header(":", true, 404); // this is used to set a header AND modify the http response code
// ":" is used as a hack to avoid specifying a real header
http_response_code(404); // PHP >= 5.4