Question

I developed a blog from scratch and things has gone great so far. I finally got around to writing my first post/article, and I've been waiting for Google to index this specific page to make sure there aren't any issue with it. Well, google is currently indexing the same page 4 times, I have (with the help of users from stackoverflow) a mod_rewrite on my htaccess to rewrite all urls to hyphens coming from a specific file (article.php).

My currently article page stands as followed. example: www.site.com/article.php?article_id=10&article_title=friendly url goes over here

with mod_rewrite I have change the urls to the following.

www.site.com/article/id/friendly-url-goes-over-here

but Google seems to be indexing the same page 4 times like so..

www.site.com/article/10/friendly-url-goes-over-here
www.site.com/article/10/friendly-url-goes%20over%20here
www.site.com/article/10/friendly-url%20goes%20over%20here
www.site.com/article/10/friendly%20-url%20goes%20over%20here

Why is it indexing 4 of the same page? It seems to index the pages how many times each hyphen is inserted, so if there were 10 hyphens, I'm guessing Google would index 10 of the same page.. Here is all of my htaccess file.

RewriteEngine on

# add www before hostname
RewriteCond %{HTTP_HOST} ^oddify\.co$ [NC]
RewriteRule ^ http://www.%{HTTP_HOST}%{REQUEST_URI} [R=302,L,NE]

# if on article page, get slugs and make into friendly url
RewriteCond %{THE_REQUEST} \s/article\.php\?article_uid=([^&]+)&article_title=([^&\ ]+)
RewriteRule ^ /article/%1/%2/? [L,R=302,NE]

# if page with .php is requested then remove the extension
RewriteCond %{THE_REQUEST} \s/+(.+?)\.php[\s?] [NC]
RewriteRule ^ /%1/ [R=302,L,NE]

RewriteRule "^(article)/([^ ]*) +(.*)$" /$1/$2-$3 [L,R]

# Force a trailing slash to be added
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{THE_REQUEST} \s/+([^.]+?[^/.])[\s?] [NC]
RewriteRule ^ /%1/ [R=302,L]

# allow page direction to change the slugs into friendly seo URL
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f 
RewriteRule (?:^|/)article/([^/]+)/([^/]+)/?$ /webroot/article.php?article_uid=$1&article_title=$2 [L,QSA,NC]

# silently rewrite to webroot
RewriteCond %{REQUEST_URI} !/webroot/ [NC]
RewriteRule ^ /webroot%{REQUEST_URI} [L]

# .php ext hiding
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(.+?)/?$ $1.php [L]

I wrote this question a few days ago, I made sure to de-index the pages from google, then now google has gone ahead and re-indexed them the same way.

Here is the google pages showing the 4 index pages: google search page

Was it helpful?

Solution 2

Try changing this redirect to a 301:

RewriteRule "^(article)/([^ ]*) +(.*)$" /$1/$2-$3 [L,R=301]

The 301 status tells google (and browser and other clients) that the redirect is permanent and the old URL (the one with spaces) shouldn't be considered anymore.

OTHER TIPS

The problem with your .htaccess file is that for every space in the title, it will do one redirect. It might or might not be fixed by using a permanent redirect (301), but even then browser will give an error (redirect loop detected) if too many spaces appear in the title. You can fix both problems by simply doing it all on the server:

RewriteRule ^article/([^\ ]*)\ ([^\ ]*\ .*) /article/$1-$2 [N]
RewriteRule ^article/([^\ ]*)\ ([^\ ]*)$ /article/$1-$2 [L,R=301]

The first rule matches if at least 2 spaces appear in the url, and will rewrite one of the spaces and order Apache to go through the .htaccess file again ([N]). If only one space is left, the second rule will match and, besides rewriting that last space, it will also redirect the user. This will only be one redirect, and hopefully the permanent redirect will cause only the new url to be visible in Google.

If there are more spaces in the url than there are internal recursions allowed by Apache, this will result in an Internal Server error. If you have access to httpd.conf, you can alter LimitInternalRecursion to allow more internal recursions. Warning: Set this to a SANE number. If, for some reason you have an endless loop in your RewriteRules and this number is insanely high, you'll lock up your server until it hits this limit. See the documentation.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top