Question

This seems to have something to do with the subnet/availability zone, but I'm new to using a VPC and it's eluding me.

VPC: 10.80.0.0/16
subnet: 10.80.1.0/24 (us-east-1b)
subnet: 10.80.2.0/24 (us-east-1a)

All instances are Windows Server 2012.

I have an internet facing ELB created within my VPC (10.80.0.0/16). There is one instance added from AZ us-east-1a, which is on subnet 10.80.2.0/24. The instance is running IIS 7.5, with an app running on port 80 and /health.aspx set up for use as the ELB health check.

Internal traffic on the VPC is flowing normally (unrestricted). I can request health.aspx from this instance from another instance in us-east-1b (10.80.1.0/24). I can also copy files from one instance to another.

Outbound traffic is unrestricted. I can RDP to the instance (when connected to our VPN) and open a browser and request a web page and get it.

The ELB says the instance is healthy and I can see the requests to health.aspx in the IIS logs. Both the ELB and the instance are configured with a security group that allows 80 and 443.

But if I try to request {elb-url}/health.aspx over the open internet the request just times out. Similarly, with an elastic IP associated to the instance, a request to {elastic-ip}/health.aspx times out.

Était-ce utile?

La solution

@Chris, thanks for the response...as it happens, I've already worked it out with some help from a friend. I'll post my findings here for posterity (in case anybody else was similarly confused about how ELB works).

This would be more clear with a diagram. But the summary is that in each availability zone, you need to create both a public and a private subnet. When you add availability zones to your ELB, you need to select the public subnet for the zone. This had already been done in us-east-1b before I got to this setup, and I had simply missed this nuance of ELB configuration. So for the new availability zone, I had to do this...

us-east-1c private subnet 10.1.3.0/24 (using nat instance as default route) public subnet 10.1.4.0/24 (using internet gateway as default route)

Then my instance goes in the private subnet as expected. And the lynch pin of this whole thing is (drum roll....)

When I add us-east-1c to my ELB, I have to select the public subnet...10.1.4.0. Otherwise the instances will pass the health check (since the ELB can communicate with any instance within my entire VPC) but the responses from the servers cannot make it back out to the public internet.

This is what is so confusing. And I still don't fully understand it. The instance can make a request for, say, www.google.com. I can RDP to it and open a browser and get the web page. But a request from a host (like my laptop at my house) will die. strange.

PS: another note...make sure you are using enough NAT instance for your load. I think we ran into an issue where our NAT instance simply ran out of ports because too many web servers were trying to route outbound connections to 3rd party APIs through it. Quite honestly, I'm not good enough at this level of network/OS troubleshooting to be sure. But my theory is that our 8 instances of IIS were holding too many connections open to the NAT instance. We were also abusing the NIC on that micro instance. I upped us to two large instances, one per AZ and things smoothed back out. Both NAT instances are humming and we're not seeing the hung processes in IIS anymore.

Autres conseils

Debugging this kind of issue is always a challenge. I have a few ideas to suggest based on what you have written (and generally apply to trying to solve this problem) that come from dealing with this a number of times.

  • Have you checked both the security groups and network ACLs? Bear in mind that all network ACLs need to be specified in both directions, as they are stateless. Also bear in mind that ELBs are a bit unique in this regard. While they are associated with your VPC, they sometimes need extra rules to ensure connectivity. In the past I have debugged this by opening all network ACLs on all ports, then removing these rules until it has stopped working in order to identify where the block was.
  • Security groups should be checked too. They are stateful but ensure that your load balancer has permissions to be hit from the web.
  • Have you checked this isn't an application configuration problem? I don't know how IIS comes out of the box but I would check it is setup to respond to all hostnames.
  • Check the ELB isn't an internal one, as that wouldn't be publically addressable.
  • You say the ELB is configured with the health check, but it's worth checking you also have the listener setup for port 80? It's in a separate tab on the dashboard and you will need this in addition to the health check for connectivity through the ELB.

Hope one of these tips is useful to you.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top