Graylog2/Passenger 3.0.21; writev() "/tmp/passenger-standalone.3012/proxy_temp/3/00/0000000003" has written only 4096 of 8192 while reading upstream

https://stackoverflow.com/questions/18873321

29-06-2022
|

Question

I have a Graylog2 install (0.11.0), served with Passenger running as standalone (3.0.21). It's backed with multiple ElasticSearch servers plus MongoDB.

About a week ago, it was running Passenger 3.0.18 and this error started to show up in the Graylog server logs when you tried to load messages:

2013/09/13 13:47:32 [crit] 27720#0: *1451 writev() "/tmp/passenger-standalone.27619/proxy_temp/6/00/0000000006" failed (28: No space left on device) while reading upstream

Checked /tmp/, and it had 8% utilization. Meanwhile on the front-end, when you tried to load the Messages page in Graylog, the page would load fine all except for the actual messages. I tried upgrading Passenger to 3.0.21 and the behavior stayed the same but the error changed:

2013/09/17 10:16:53 [crit] 3113#0: *10 writev() "/tmp/passenger-standalone.3012/proxy_temp/3/00/0000000003" has written only 4096 of 8192 while reading upstream

Next I checked out ES machines. They were running with high CPU load, so I changed the amount of max indexes they were keeping from Graylog and that brought them right back down...but still no change in behavior.

My best guess on this error is that it's some sort of timeout, but I can't find any other thread where anyone's gotten this error, and I don't see why a timeout should be happening now that the ES machines are within range again. All other Graylog web pages work fine, as do Streams.

Solution

I ended up doing a few things to resolve this issue.

Change Graylog's processor_wait_strategy to 'blocking'. This greatly reduced the amount of CPU the graylog-server app was using.
Cut the amount of data ElasticSearch was storing by reducing the elasticsearch_max_number_of_indices for Graylog.
And the thing that helped the most, stop the Graylog server, and delete the graylog2_recent ElasticSearch index. Then restart the Graylog server, and it will re-create it. Once I did this, the amount of CPU load on the ElasticSearch servers dropped drastically and Messages and searches began to work again. Once the index re-filled, it continued to work correctly.

Hopefully this helps some other poor individual Googling this error.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow