Advice on application log levels

https://stackoverflow.com/questions/13094593

14-07-2021
|

Question

I'm currently working in a large project with a lot of applications that are communicating with each other.

Me and my team manage and adjust the applications in the system with necessary bug fixes and change requests. The system is being used heavily, and the applications uses a lot of logging.

Typical example:

MessageClient

public void save(final Message message) {
   logger.info("Trying to save message: {}", message);

   boolean result = false;
   try {         
     result = messageService.save(message);
   } catch (final MessageStoreException e) {          
      logger.warn("Unable to save message {}", message, e);
      throw e;
   } catch (final Exception e) {
      logger.error("Unknown error when trying to save message!", e);
   }

   if (!result) {
      logger.warn("Could not save the message!");
   }
}

MessageService

public boolean save(final Message message) throws MessageStoreException {  
   if (message == null) {
      throw new IllegalArgumentException("message!");
   } 

   final boolean result = messageStore.store(message);
   if (result) {
      logger.info("Stored: {}", message.getId());
   } else {
      logger.warn("Unable to store: {}", message.getId());
   }

   return result; 
}

NOTE: I know that the example code does not have the best error handling, but this is how it looks like in many applications that we manage.

Of course, this makes the logfiles VERY big.

I would like to turn of log level info and log level warn in the production environment, and only leave the error level on, so that the logfiles only contains unexpected errors that need attention and nothing else.

The other developers do not like this idea, as they don't know how to follow the "application flow" when they are viewing the logfiles searching for bugs and errors.

I understand these arguments, and I feel that I need some input from the community.

So, what is best practice here? Should we use info/warn log levels in the production environment or should we only use error logging? Or maybe both?

Thanks!

UPDATE: The applications run on multiple servers, and we currently log everything to file (usual one log file per application with a RollingFileAppender). It is to much work to start logging to a database, so this is not an option.

CONCLUSION: Logging is not entirely trivial. We will not turn off the info and warning levels (it was a pretty drastic action), but instead just like @jgauffin says, go through and analyze business rules for the applications that prints "unnecessary" log messages.

Case closed! Thank you all for the great input and good advice.

Solution

I would like to turn of log level info and log level warn in the production environment, and only leave the error level on, so that the logfiles only contains unexpected errors that need attention and nothing else.

The other developers do not like this idea, as they don't know how to follow the "application flow" when they are viewing the logfiles searching for bugs and errors.

This is a typical problem. Let's analyze the logging:

final boolean result = messageStore.store(message);
   if (result) {
      logger.info("Stored: {}", message.getId());
   } else {
      logger.warn("Unable to store: {}", message.getId());
   }

That's indeed a problem since it doesn't seem that the team is sure if it's a domain rule that the message can be stored or not. I would most likely say that not being able to store a message should indeed be an exception (and an exception should therefore be thrown). But then again, I don't know anything about the domain / business rules.

Logging like that are however typically indicating that the business rules are unclear. So a much better solution is probably to go make the team analyze why the logging is so heavy. Is the application generating a lot of maintenance? Then it's probably better to remove logging and and more error checks (like validating method arguments) instead of turning of log levels.

The teams remark of that they can't follow the flow without the logging indicates the same thing: Arguments are not checked so that the bug is introduced deep down instead of early in the application.

OTHER TIPS

Have you considered logging different things into different logs. Transaction data in one log where you can follow transactions and error-logging into another log. This will allow you to follow the status of messages and have a log where it is easy to see if what goes wrong.

Compare with a web server that has an access-log and an error-log. I would agree with your team that until you have other means of following the flow you can't disable those messages in production.

You could log into a database. (Should not be that hard to set up with a decent logging framework.)

From there you can delete entries based on level and age. Update: First you log everything (including DEBUG if you like). After, say, one week you delete DEBUG messages. After one month you delete INFO messages. At this point you have every thing that's now stored in you files.

Bonus: When a bug is suspected, you suspend deleting for the time being.

After, maybe, on year you delete the rest.

This way you should be able to address both needs: required space and kept information. This can be adjusted as needed.

Most installations i've worked with had info, warn, and error logging enabled in production. We would expect to see a bunch of info-level logging when the system starts up, and fairly little after that. We would expect to see no error or warning logging during normal operation - if there is any, that's because there are problems which need looking into.

It seems you are doing quite a bit more info logging than this, though. You could consider changing some of that to debug logging, and then either disabling it, or having it written to a separate log file to the errors and warnings.

However, is there a problem with having large log files? Are you running out of disk? Are you having trouble finding useful information in them? If not, then leave things as they are. If your problem is finding useful information, then i would focus effort on finding ways to deal with large log files, rather than trying to make them smaller. The information in a detailed log can be very useful in all sorts of ways, and there is no fundamental reason that the size should be a problem.

Where i work at the moment, we are moving towards putting more and more things in our logs. Things that are currently being handled through monitoring systems (counts of messages processed, timings of database queries, etc) are moving to logs. We are then simply sending all our logs to a central logstash instance, which lets us easily search and analyse them. We can even generate metrics and alerts from the log stream, rather than having to handle this in the apps.

For production environment it's best practice to keep separate log files for logger level TRACE and ERROR.

In TRACE log file you able to identify unwanted messages, remove those messages.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow