OK this turned out to be related to VMWare configuration. The machine is a 12Gb server, but was configured to have 6Gb permanently reserved, with 6Gb taken from a pool. With a lot of memory pressure and swapping on the physical level random Win32 exceptions started to get thrown in the VM. The solution is to make more memory available.
UPDATE: The above was coincidence, it is not related to VMWare most likely.
The issue returned after a month. It seems that what something on the server has changed which slows down garbage collection, and my per-call wcf service is not disposing EtwRegistration handles explicitly (ie. I am not explicitly Disposing the EventProvider). Experiments show that there is a limit of 1000 EventProviders per process. The change in performance on the server resulted in a handle leak that hit that limit.
Further update: If anyone would like to increase the number of providers, instead of forcing cleanup for whatever reason, I think this might help http://support.microsoft.com/kb/2583244