Question

We're experiencing a strange problem with our current Varnish configuration.

4x Web Servers (IIS 6.5 on Windows 2003 Server, each installed on a Intel(R) Xeon(R) CPU E5450 @ 3.00GHz Quad Core, 4GB RAM)

3x Varnish Servers (varnish-3.0.3 revision 9e6a70f on Ubuntu 12.04.2 LTS - 64 bit/precise, Kernel Linux 3.2.0-29-generic, each installed on a Intel(R) Xeon(R) CPU E5450 @ 3.00GHz Quad Core, 4GB RAM)

The 3 Varnish Servers have a pretty much standard, vanilla cfg: the only thing we changed was the vcl_recv and vcl_fetch in order to handle the session cookies. They are currently configured to use in-memory cache, but we already tried switching to HDD cache using an high-performance Raid Drive with the same exact results.

We had put this in place almost two years ago without problems on our old web farm, and everything worked like a blast. Now, using the machines described above and after a clean reinstall, our customers are experiencing a lot of connection problems (pending request on clients, 404 errors, missing files, etc.) when our websites are under heavy traffic. From the console log we can clearly see that these issues start happening when each Varnish reaches roughly 700 request per seconds: it just seems like they can't handle anything more. We can easily reproduce the critical scenario at any tme by shutting down one or two Varnish Servers, and see how the others react: they always start to skip beats everytime the req per seconds count reaches 700. Considering what we've experienced in the past, and looking to the Varnish specs, this doesn't seem to be normal at all.

We're trying to improve our Varnish servers performances and/or understand where the problem actually is: in order to do that, we could really use some kind of "benchmark" from other companies who are using it in a similar fashion in order to help us understand how far we are from the expected performances (I assume we are).

EDIT (added CFG files): This is our default.vcl file. This is the output of varnishadm >param.show output console cmd.

I'll also try to post a small part of our varnishlog file.

Thanks in advance,

Was it helpful?

Solution

To answer the question in the headline: A single Varnish server with the specifications you describe should easily serve 20k+ requests/sec with no other tuning than increasing the number of threads.

You don't give enough information (vcl, varnishlog) to answer your remaining questions.

My guess would be that you somehow end up serialising the backend requests. Check out your hit_for_pass objects and make sure they have a valid TTL set. (120s is fine)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top