Question

We're using the Google CSE (Custom Search Engine) paid service to index content on our website. The site is built of mostly PHP pages that are assembled with include files, but there are some dynamic pages that pull info from a database into a single page template (new releases for example). The issue we have is I can set an expire date on the content in the database so say "id=2" will bring up a "This content is expired" notice. However, if ID 2 had an uploaded PDF attached to it, the PDF file remains in the search index.

I know I could write a cleanup script and have cron run it that looks at the db, finds expired content, checks to see if any uploaded files were attached and either renames or removes them, but there has to be a better solution (I hope).

Please let me know if you have encountered this in the past, and what you suggest.

Thanks, D.

Was it helpful?

Solution 2

What we ended up doing was tying a check script to the upload script that once it completed the current upload, old files were "unlinked" and the DB records were deleted.

For us, this works because it's kind of an "add one/remove one" situation where we want a set number of of items to appear in a rolling order.

OTHER TIPS

There's unfortunately no way to give you a straight answer at this time: we have no knowledge of how your PDFs are "attached" to your pages or how your DB is structured.

The best solution would be to create a robots.txt file that blocks the URLs for the particular PDF files that you want to remove. Google will drop them from the index on its next pass (usually in about an hour).

http://www.robotstxt.org/

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top