Question

For instance, using this code:

 $curl = curl_init();
 curl_setopt_array( $curl, array(
      CURLOPT_RETURNTRANSFER => true,
      CURLOPT_URL => "$url" ) );
 curl_exec( $curl );
 $header = curl_getinfo( $curl, CURLINFO_HTTP_CODE );
 curl_close( $curl );

$url = "http://upenn.edu" will not work, while $url = "http://www.upenn.edu" will work.

Without the www. the response code I get is 0, whereas with the www. it is 200.

If I were to use PHP get_headers("http://upenn.edu"), I would get two errors:

Warning: get_headers() [function.get-headers]: php_network_getaddresses: getaddrinfo failed: nodename nor servname provided, or not known

and

Warning: get_headers(http://upenn.edu) [function.get-headers]: failed to open stream: php_network_getaddresses: getaddrinfo failed: nodename nor servname provided, or not known

However, when I use the exact same code, http://google.com will work (as well as the expected http://www.google.com.)

Then, for a website such as http://www.dogpile.com, the www. part included returns a response code of 0 whereas without the www., I get a 302.

Why is this? and is there a better method to use in order to ensure reliable results (i.e., where a www. is not present, yet the response code is still returned?)

I am new to using cURL and dealing with headers and response codes, so any help is appreciated. Thank you.

Was it helpful?

Solution 2

Your question, even asked because of using curl now, is actually something totally independent to curl. Other client http libraries will be the same with these examples because it is related to the domain name system and services running on a computer.

Curl is a HTTP library. If you do a HTTP request, by default you will try to connect to port 80 on a remote computer.

The remote computer is identified by an IP address. That is a number like 173.194.35.134 - you probably know that already.

Most often not the numbers are used but some domain names, for example google.com for 173.194.35.134.

So telling curl to use the URI http://google.com/ will open a connection to

173.194.35.134:80

The domain name system will resolve the domain google.com to the IP address.

Domain names can be organized in levels. Each level is separated by a dot .. The so called Top Level Domain (TLD) is the part most on the right, for google.com that is com. The Second Level Domain (SLD) is respectively google then. And with www.google.com you have another domain name, with three levels then. The www is commonly refered to as Subdomain.

The most important part here is that for every different domain the DNS system can return a different IP address.

Therefore www.google.com and google.com can be two totally different things. The www subdomain is only a common convention to name the webserver on a network organized with a SLD.TLD.

So by this being common you could try both and see which one works. However I would not try more than with and w/o www.

OTHER TIPS

Not all domains treat www.domain.com and domain.com the same. Usually they do, but if you wanted to you could have two completely different websites on them.

Personally, I like to have all requests to www.mydomains.com redirected to the www-less version, but that's just my preference.

There is no realiable way of automatically detecting whether or not to use www.

There are many reasons for this.

Status of "0" means you did not get a response. This can be because of:

  • url does not resolve the a server (e.g. if you don't put www but the server expects it - as Kolink says, you don't have to have websites on both)
  • server does not respond (e.g. the url might get to the server, but the webserver doesn't give you a response)
  • server responds with nothing (probably what is happening with dogpile; you are not passing appropriate headers so it knows you are a computer and no a human so is just bouncing you straight back)

Status of 200 means all is good.

Status of 3XX generally means moved. With 302, if you read the rest of the headers, you'll find a URL that the site has moved to, it's suggested you go there. (Note, cUrl can handle redirects automatically

The others you commonly get are 100 (continue), 404 (not found) and 500 (server error) but in practice, a server can return ANYTHING. including 418 "I'm a little teapot". (http://tools.ietf.org/html/rfc2324)

More reading:

$ dig upenn.edu

; <<>> DiG 9.8.3-P1 <<>> upenn.edu
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 54604
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;upenn.edu.         IN  A

;; Query time: 2 msec
;; SERVER: 10.0.1.1#53(10.0.1.1)
;; WHEN: Tue Dec 18 17:37:18 2012
;; MSG SIZE  rcvd: 27

$ dig www.upenn.edu

; <<>> DiG 9.8.3-P1 <<>> www.upenn.edu
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10583
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.upenn.edu.         IN  A

;; ANSWER SECTION:
www.upenn.edu.      123 IN  CNAME   www.upenn.edu-dscg.edgesuite.net.
www.upenn.edu-dscg.edgesuite.net. 4782 IN CNAME a1165.dscg.akamai.net.
a1165.dscg.akamai.net.  4   IN  A   208.47.254.80
a1165.dscg.akamai.net.  4   IN  A   208.47.254.83

;; Query time: 2 msec
;; SERVER: 10.0.1.1#53(10.0.1.1)
;; WHEN: Tue Dec 18 17:37:23 2012
;; MSG SIZE  rcvd: 141

The University of Pennsylvania has neglected to setup a DNS record for the non-www variant of their domain name. It's odd that they've done so (and may be related to their CDN setup, which relies on a CNAME, which you can't have for the root level of a domain).

Nothing to do with cURL, just upenn.edu's DNS setup.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top