Question

I want to make like a proxy page (not for proxy at all) and as i knew i need to change all URLS SRC LINK and so on to others - for styles and images grab from right play, and urls goto throught my page going to $_GET["url"] and then to give me next page.

But iv tied to preg_replace() each element, also im not so good with it, and if on one website it works, on another i cant see CSS for example...

The first question is there are any PHP classes or just scripts to make it easy? (I was trying to google hours)

And if not help me with the following code :

<?php
$url = $_GET["url"];
$text = file_get_contents($url);
$data = parse_url($url);
$url=$data['scheme'].'://'.$data['host'];
$text = preg_replace('|<iframe [^>]*[^>]*|', '', $text);
$text = preg_replace('/<a(.*?)href="([^"]*)"(.*?)>/','<a $1 href="http://my.site/?url='.$url.'$2" $3>',$text);
$text = preg_replace('/<link(.*?)href="(?!http:\/\/)([^"]+)"(.*?)/', "<link $1 href=\"".$url."/\\2\"$3", $text);
$text = preg_replace('/src="(?!http:\/\/)([^"]+)"/', "src=\"".$url."/\\1\"", $text);
$text = preg_replace('/background:url\(([^"]*)\)/',"background:url(".$url."$1)", $text);
echo $text;
?>

Replacing with "src" №4 i need to denied replace when starts from double slash, because it could starts like 'src="//somethingdomain"' and not need to replace them.

Also i need to ignore replace №2 when href is going to the same domain, or it looks like need.site/news.need.site/324244

And is it possible to pass action in form throught my script? For example google search query.

And one small problem one web site is openning corrent some times before, but after iv open it hundreds times by this script in getting unknown symbols (without any divs body etc...) ��S�n�@�� i was trying to encode to UTF-8 ANSI but symbol just changing,

maybe they ban me ? oO

Was it helpful?

Solution

function link_replace($url,$myurl) {

    $content = file_get_contents($url);
    $content = preg_replace('#href="(http)(.*?)"#is', 'href="'.$myurl.'?url=$1$2"', $content);  
    $content = preg_replace('#href="([^http])(.*?)"#is', 'href="'.$myurl.'?url='.$url.'$1$2"', $content);

    return $content;
}

echo link_replace($url,$myurl);

OTHER TIPS

I'm not absolutely sure but I guess the result is just compressed e.g. with gzip try removing the accepted encoding headers while proxying the request.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top