Question

I'm attempting to setup Canonical links for a number of PDF and images files on my website.

Example Folder Structure:

/index.php
/docs/
    file.pdf
    /folder1/
        file.pdf
    /folder2/
        file1.pdf
        file2.pdf
/img/
    sprite.png
    /slideshow/
        slide1.jpg
        slide2.jpg

Example PDF URL to Canonical URL: http://www.example.com/docs/folder1/file.pdf --> http://www.example.com/products/folder1/

I am trying to avoid having to put individual .htaccess files in each of the sub-folders that contain all of my images and PDFs. I currently have 7 "main" folders, and each of these folders have any where from 2-10 sub-folders, and most sub-folders have their own sub-folders. I have roughly 80 PDFs, and even more images.

I'm looking for a (semi)dynamic solution where all files in a certain folder will have the Canonical Link set to a single url. I want to keep as much as possible in a single .htaccess file.

I know that <Files> and <FilesMatch> do not understand paths, and that <Directory> and <DirectoryMatch> don't work in .htaccess files.

Is there a fairly simple way to accomplish this?

Was it helpful?

Solution

I don't know of a way to solve this with apache rules alone as it would require some sort of regex matching and reusing the result of the match in a directive, which isn't possible.

However, it's pretty simple if you introduce a php script into the mix:

RewriteEngine On
RewriteCond %{REQUEST_URI} \.(jpg|png|pdf)$
RewriteRule (.*) /canonical-header.php?path=$1

Note that this would send requests for all jpg, png and pdf files to the script regardless of the folder name. If you want to include only specific folders, you could add another RewriteCond to accomplish that.

Now the canonical-header.php script:

<?php

// Checking for the presence of the path variable in the query string allows us to easily 404 any requests that
// come directly to this script, just to be safe.
if (!empty($_GET['path'])) {
    // Be sure to add any new file types you want to handle here so the correct content-type header will be sent.
    $mimeTypes = array(
        'pdf' => 'application/pdf',
        'jpg' => 'image/jpeg',
        'png' => 'image/png',
    );

    $path         = filter_input(INPUT_GET, 'path', FILTER_SANITIZE_URL);
    $file         = realpath($path);
    $extension    = pathinfo($path, PATHINFO_EXTENSION);
    $canonicalUrl = 'http://' . $_SERVER['HTTP_HOST'] . '/' . dirname($path);
    $type         = $mimeTypes[$extension];

    // Verify that the file exists and is readable, or send 404
    if (is_readable($file)) {
        header('Content-Type: ' . $type);
        header('Link <' . $canonicalUrl . '>; rel="canonical"');
        readfile(realpath($path));
    } else {
        header('HTTP/1.0 404 Not Found');
        echo "File not found";
    }
} else {
    header('HTTP/1.0 404 Not Found');
    echo "File not found";
}

Please consider this code untested and check that it works as expected across browsers before releasing it to production.

OTHER TIPS

Here is the solution !!!

you can use .htacess file for controlling header which is more simple way to manage headers.

How you can do ?

Lets take a example, I have a pdf named "testPDF.pdf" which is in the root folder of my site. All you have to do, pasted following code into .htaccss file.

<Files testPDF.pdf > Header add Link '<http://<your_site_name>.com/ >; rel="canonical"' </Files>

Once you've added that to your .htaccess file, you'll need to test your header to ensure that it's working accurately

For an IIS solution, try something like this.

Response.AppendHeader("Link", "<" + "https://" + Request.Url.Host + "/" + product.GetSeName() + ">; rel=\"canonical\"");

this was added to a function which generated a PDF version of the webpage :)

I was able to achieve adding canonical links for files in different directories through a single .htacess file.

The following code adds a canonical link for each file pointing to the same directory:

<FilesMatch "\.(jpg|png|pdf)$">
    RewriteRule ([^/]+)\.(jpg|png|pdf)$ - [E=FILENAME:%{HTTP_HOST}/<your-desired-location>/$1.$2]
    Header add Link '<https://%{FILENAME}e>; rel="canonical"'
</FilesMatch>

And the code below adds a canonical link to the file's requested URL, which in many cases will be its actual location on the server:

<FilesMatch "\.(jpg|png|pdf)$">
    RewriteRule ([^/]+)\.(jpg|png|pdf)$ - [E=FILENAME:%{HTTP_HOST}%{REQUEST_URI}]
    Header set Link '<https://%{FILENAME}e>; rel="canonical"'
</FilesMatch>
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top