Question

Is there any automated way to test if two webpages are exactly the same (even the images, text etc).

Was it helpful?

Solution

You can fetch both pages to a variable and compare the results. Here is a short script in PHP.

<?php
$page1 = file_get_contents('http://SITE1');
$page2 = file_get_contents('http://SITE2');

if ($page1 == $page2) {
    echo 'Pages are identical';
}
?>

You do so with command line if available as utilities as well. You man need to install wget.

$: wget -O site1 SITE1
$: wget -O site2 SITE2
$: diff site1 site2

I hope that helps

OTHER TIPS

I have created two test cases , 1 demonstrating different pages another with same webpages.

Replace the $webpage1-4 parameters to achieve the result.

<?php
$webpage1 = file_get_contents('http://php.net');
$webpage2 = file_get_contents('http://wikipedia.com');

$webpage3 = file_get_contents('http://stackoverflow.com');
$webpage4 = file_get_contents('http://stackoverflow.com');

//Test Case 1 ( If different )
$hash_page1 = md5($webpage1);
$hash_page2 = md5($webpage2);

if($hash_page1 === $hash_page2 ){
echo "Page's Have a Same Code \n";
echo "The Md5 Hash of Both is :".$hash_page1;
}else{
echo "-= Pages are Different =-";
echo "<br/>Hash1 of I page is : ".$hash_page1."<br/>Hash2 of II page : ".$hash_page2;
}

//Test Case 2 ( If Same webpage )

$hash_page3 = md5($webpage3);
$hash_page4 = md5($webpage4);

if($hash_page3 == $hash_page4){
echo "<br/><br/> Test Case : Similar Sites, Both Pages are Same,<br/> Hash is : ".$hash_page3;
}

?>

NOTE:

Pros: Tracks for Page changes or exact similarity comparision. cons: Even an ALPHABET / SYMBOL change , the hash changes and Doesnt match , beaware !

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top