Question

I have a website in php, that pass certain php variables to javascript variables, google crawled me, which generates errors and duplicate content. Is there any way to make the google crawler to ignore the declaration of these variables in javascript?

    echo '<script language="javascript">var '.$item['Nombre'].'="'.$descripcion.'";</script>';

Sorry for my english,

Was it helpful?

Solution

Google crawling javascript code and considering it duplicate? I have never heard of this problem before. Some of my pages have inlined javascript (if the content is small), that means the same <script>...</script> on every page.

There are also cases where I output javascript variables more-or-less the same way you do. Google never marked it as "duplicate content".

Description from here:

Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin. Examples of non-malicious duplicate content could include:

  • Discussion forums that can generate both regular and stripped-down pages targeted at mobile devices
  • Store items shown or linked via multiple distinct URLs
  • Printer-only versions of web pages

You can get this kind of errors if you have the same content on more than one of your pages, but google does not parse javascript as content. (Although you can never know for sure what google does or does not). The same way that google will not mark your <head> tag as duplicate, or there is no penalty for having the same layout (menu, footer, etc) on every page.

You can put that <script> tag in an <aside> tag just to be sure.

The HTML Element represents a section of a page that consists of content that is tangentially related to the content around it, which could be considered separate from that content. Such sections are often represented as sidebars or as inserts. They often contain side explanations, like a glossary definition; more loosely related stuff, like advertisements; the biography of the author; or in web-applications, profile information or related blog links.

This means that the content will be more or less ignored by google when indexing the page. It will not mark it as a duplicate since it could be a commertial.

Also drop the language="javascript" attribute from your script tags. I doubt that it would confuse google in any way, since that attribute is deprecated (use type instead) and nothing takes it into account nowadays. But if google bot does, the correct value would be text/javascript instead of simply javascript. It is possible that google does not recognise the value javascript and parses it as unknown type of text content.

The default type of the script is text/javascript, so it is safe to omit.


Above all I suspect that the problem is not the existence of JS variables, but some other thing like GET parameters in your URL. GET parameters can be dealt with by configuring URL Parameters correctly in Webmaster Tools.

OTHER TIPS

Important: This is bad practice in most of the cases. If google notices that you serve different content to it's bot and considers it relevant, than your site can get penalties beyond measure.

I do recommend this php solution:

in PHP use this code:

if (!strpos($_SERVER[‘HTTP_USER_AGENT’],"Googlebot")) {
  //echo the script
}
else{ //dont echo, does nothing. }

But if this don't work you can try adding this javascript code into your script tag:

    if (!navigator.userAgent.contains('Googlebot')) {
    //do the script
    } else {
    //does nothing
    }

Ps: Here is a list of User-Agents http://www.useragentstring.com/pages/Crawlerlist/

Another (untested, speculative) approach that requires that you can write your own robots.txt file:

  1. Move all your javascript code generation to another URL and include this as a javascript script in your page: <script type="text/javascript" src="/path/to/my/php/that/generates/js/variables.php"></script>

  2. Add that URL to your robots.txt file (see Google answer)

User-Agent: Googlebot

Disallow: /path/to/my/php/that/generates/js/variables.php

You can Use following PHP code:

$crawlers = array(
'Google'=>'Google',
'MSN' => 'msnbot',
'Rambler'=>'Rambler',
'Yahoo'=> 'Yahoo',
'AbachoBOT'=> 'AbachoBOT',
'accoona'=> 'Accoona',
'AcoiRobot'=> 'AcoiRobot',
'ASPSeek'=> 'ASPSeek',
'CrocCrawler'=> 'CrocCrawler',
'Dumbot'=> 'Dumbot',
'FAST-WebCrawler'=> 'FAST-WebCrawler',
'GeonaBot'=> 'GeonaBot',
'Gigabot'=> 'Gigabot',
'Lycos spider'=> 'Lycos',
'MSRBOT'=> 'MSRBOT',
'Altavista robot'=> 'Scooter',
'AltaVista robot'=> 'Altavista',
'ID-Search Bot'=> 'IDBot',
'eStyle Bot'=> 'eStyle',
'Scrubby robot'=> 'Scrubby',
);

function crawlerDetect($USER_AGENT)
{
// to get crawlers string used in function uncomment it
// it is better to save it in string than use implode every time
// global $crawlers
// $crawlers_agents = implode('|',$crawlers);
$crawlers_agents = 'Google|msnbot|Rambler|Yahoo|AbachoBOT|accoona|AcioRobot|ASPSeek|CocoCrawler|Dumbot|FAST-WebCrawler|GeonaBot|Gigabot|Lycos|MSRBOT|Scooter|AltaVista|IDBot|eStyle|Scrubby';

if ( strpos($crawlers_agents , $USER_AGENT) === false )
   return false;
// crawler detected
// you can use it to return its name
/*
else {
                                                                                                                                                        1,1           Top
 return array_search($USER_AGENT, $crawlers);
}
*/
}

Using above method you can check Request is coming from crawler or not.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top