Question

Been chasing this problem for a few days now. It comes and goes...meaning, the problem exists for 8-10 hours, then goes away for 12 hours, then re-appears for 4 hours then goes away for 2 hours etc.

Here is my setup: I have a windows 2008 R2 server running IIS 7.5 that is hosting about 60 web sites. Each website has its own IP address that it's bound to. All of the sites are running on port 80 except for one (my admin panel) which is running on port 443 (ssl).

I noticed that the sites stopped working last week. I hit one of the sites, and chrome gives me the "Ooops! Google Chrome could not connect to www.mysite.com try reloading...". If I refresh the page 5 or 6 times (sometimes it takes 10+ times), then the site will come up. My initial thought was that this is a network related issue so I RDP'd to the server, opened up IIS and browsed to some of the sites from the browse link within IIS. I saw the same thing on the server directly. I then thought, maybe it's because by hitting the IP, it's still going out to the router first which might be the source of the problem so I bound one of the sites to the loopback address (127.0.0.1) and saw the same problem. The one constant in this whole issue is that my site on 443 has never had a hiccup which started to make me think that maybe it's just port 80 having the problem so I bound the site that I bound 127.0.0.1 to a different port (20005) and it worked fine when I browsed to it locally both by hitting 127.0.0.1:20005 and its regular IP on port 20005.

With those "tests", I have figured out that it's only port 80 that is having the problem. My next thought was that maybe it was Symantec's Endpoint Protection (my virus software) that was blocking incoming requests on port 80 so I disabled it...same problem.

I then thought maybe it was my windows firewall rule puking. I recreated the rule to allow traffic on port 80 through, same problem. I then disabled the firewall entirely to see if something else was tripping it up within the firewall, same problem.

I then tried telnetting to one of the sites on port 80 from my remote machine. It goes through just fine 3 or 4 times, and then telnet won't connect. So the times it goes through, are the times that the server would have served up the web page. The times it won't connect are the times it wouldn't have served the page.

I then added "failed request tracing" in IIS for one of the sites to see if I could nail down why it's failing when it fails. I tried to hit the site a few times, it failed but no log was created. Then I finally do get through, and it creates a log. Which tells me that when it's failing, it's not even making it to IIS.

I then tried running:

netstat -anp tcp | find ":80"

to see if something else was catching the traffic that is incoming on port 80. This didn't show me anything out of the ordinary. I also downloaded and ran TcpView and didn't see anything in there that would be catching the traffic on port 80. I then stopped IIS and ran both of the above again, and there's nothing listening on port 80 when IIS is stopped.

I then tried doing some port mapping to map traffic coming in on port 80 to the high port to see if that would at least allow the server to serve pages consistently. I used this command to map the port

netsh interface portproxy add v4tov4 listenport=80 listenaddress=188.55.22.11 connectport=20005 connectaddress=188.55.22.11

this didn't resolve the issue either which tells me that the problem is happening before it even hits that level of the network stack (which is I-don't-know-where??).

I've done an iisreset, I've rebooted the server, I've run a virus scan, I've run a windows update, I've done a disk defrag...nothing has fixed this.

Is it possible that my problem is hardware related? Could a network interface go bad to the point that it would disallow traffic on only one port, but allow all other traffic through just fine? I can FTP to the machine w/o any problems, I can RDP to the machine without any problems and as mentioned above, my site on 443 has never had the issue....it's only port 80.

I'm totally out of ideas of things to look for / ways to resolve this. ANY suggestions at this point would be welcome.

UPDATE: I switched network interfaces to the other NIC on the server and am seeing the same problem

TIA

Was it helpful?

Solution 2

Last night I actually uninstalled Symantec Endpoint Protection and things seem to be back in order for now. I think it was the network threat detection that was killing it. I will let it go for a few days and report back here with an answer if things stay cleared up.

Edit: It's been going for a few days now w/o any hiccups so I'm going to attribute the issue to SEP catching traffic and killing it. Seems really weird that it would only bounce traffic some of the time and then it would work for long periods w/o any faults at all. I have reinstalled SEP minus the network threat detection and all is good.

OTHER TIPS

Are you sure it's not your application pools crashing? 127.0.0.1:20005 works maybe because traffic stop coming in from port 80 when you map it to 20005, so your site didn't get any traffic to it that's crashing it.

Check your event log to see if any of your application pools are restarting due to crashes.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top