Question

I am running nodejs on ec2 and redirecting nodejs stdout and sterr to a file (>>) This has the following potential problems:

  1. EBS failure may halt the nodejs process (I suppose logging is synchronous). EBS is know to be less reliable than some other AWS services.
  2. EC2 instance may fail and EBS is lost (unless it's attached).
  3. Log files on EBS are not replicated across Availability Zones.
  4. Getting the logs requires SSH to the machine.

Ideally I would like all logs to be written directly to Amazon ElasticCache for Redis, and from there to S3. What is the best way to do it?

Was it helpful?

Solution

In general, it is a bad idea to store application logs in redis. Redis is an in-memory data store, and you generally do not require your logs to be in-memory.

The usual way is to store the logs on the ephemeral disk that is attached to the ec2 instance. This is different from EBS, and is much more reliable. You can then have a cron job to periodically replicate the logs to S3. This is the most common approach.

With the above approach, there is a chance you will lose some log entries. For most applications, this risk is acceptable.

If that risk isn't applicable, I'd recommend storing to a persistent store that's not on the EC2 instance. A relational database is a good start.

Redis does not make sense for logs, unless you are doing some real-time analysis. If you can explain your use case, we can recommend if it is a good fit.

EDIT

1) You are asking me to make a trade-off. Choose Ephermal for price/performance and choose attached EBS if you don't want to loose logs. Can't I have both with a Redis cluster backed by background disk based storage (in this case S3) ?

Short answer is no. The redis instance on elasticcache also has the same primitives available - Ephemeral disk and EBS. If you care about consistency, then you have to fsync always, in which case Redis will have to write to disk on every single write. You are just pushing the disk write from the web server to Redis.

If you don't fsync always, or fsync every 2s (which is the default) - you will still lose seconds worth of data.

But this is all theory. You should evaluate your use case and make explicit tradeoffs.

2) I was hoping to get an answer from someone that tried the proposed Redis solution, to learn the gap between theory and real-world practice. For example, what happens when redis starts swapping (gasp)

Again, I don't think anybody writes general application/node.js logs into Redis. You'd have to try this out for your use case and see if it works for you.

3) If I were to choose a fast disk based (append only) log storage, I would probably consider Kafka or Cassandra first

Agree with you. Redis doesn't seem a good fit for your use case, at least from what I have understood.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top