What's the best practice for centralised logging? [closed]

https://stackoverflow.com/questions/1737693

20-09-2019
|

Question

My team has inherited support for 100+ applications. The applications don't have any kind of common architecture, so the ones that do logging usually do it with custom code to local files or a local database, and it's all unmanaged. We want to change that.

We're slowly migrating the applications over to using log4net and standardising the types of things that are logged. The next question becomes: where should we send the logs?

I was thinking that it would be good to use a central SQL Server dedicated to receiving all the logs, which would provide easy maintenance (one place for backups/archiving) and provide the future possibility of some data mining and trend analysis.

Is that the best practice for this kind of thing, or is there some dedicated application logging server we should be looking at instead?

Update: I should have been more clear than just casually mentioning log4net and SQL Server: we're a Microsoft house, with most things written in .NET. UNIX solutions are no good for us.

Solution

One world of caution: at 100+ apps in a big shop, with hundreds perhaps thousands of hosts running those apps, steer clear of anything that induces a tight coupling. This pretty much rules out connect directly to SQL Server or any database solution, because your application logging will be dependent on the availability of the log repository.

Availability of the central repository is a little more complicated than just 'if you can't connect, don't log it' because usually the most interesting events occur when there are problems, not when things go smooth. If your logging drops entries exactly when things turn interesting, it will never be trusted to solve incidents and as such will fail to gain traction and support for other stake holders (ie. the application owners).
If you decide that you can implement retention and retry failed log info delivery on your own, you are facing an uphill battle: it is not a trivial task and is much more complex than it sounds, starting from eficient and reliable storage of the retained information and ending with putting in place good retry and inteligent fallback logic.

You also must have an answer to the problems of authentication and security. Large orgs have multiple domains with various trust relations, employees venture in via VPN or Direct Access from home, some applications run unattended, some services are configured to run as local users, some machines are not joined to the domain etc etc. You better have an asnwer to the question how is the logging module of each application, everywhere is deployed, going to authenticate with the central repository (and what situations are going to be unsuported).

Ideally you would use an out-of-the box delivery mechanism for your logging module. MSMQ is probably the most appropiate fit: robust asynchronous reliable delivery (at least to the extent of most use cases), available on every Windows host when is installed (optional). Which is the major pain point, your applications will take a dependency on a non-default OS component.

The central repository storage has to be able to deliver the information requested, perhaps:

the application developers investigating incidents
customer support team investigating a lost transaction reported by a customer complaint
the security org doing forensics
the business managers demanding statistics, trends and aggregated info (BI).

The only storage capable of delivering this for any serious org (size, lifetime) is a relational engine, so probably SQL Server. Doing analysis over text files is really not going to go the distance.

So I would recommend a messaging based log transport/delivery (MSMQ) and a relational central repository (SQL Server) perhaps with aanalitycal component on top of it (Analysis Services Data Mining). as you see, this is clearly no small feat and it covers slightly more than just configuring log4net.

As for what to log, you say you already give a thought but I'd like to chime in my extra 2c: often times, specially on incident investigation, you will like the ability to request extra information. This means you would like to know certain files content from the incident machine, or some registry keys, or some performance counter values, or a full process dump. It is very useful to be able to request this information from the central repository interface, but is impractical to always collect this information, just in case is needed. Which implies there has to be some sort of bidirectional communication between the applictaion and the central repository, when the application reports an incident it can be asked to add extra information (eg a dump of the process at fault). There has to be a lot of infrastructure in place for something like this to occur, from the protocol between application logging and the central repository, to the ability of the central repository to recognize an incident repeat, to the capacity of the loggin library to collect the extra information required and not least the ability of an operator to mark incidents as needing extra information on next occurence.

I understand this answer goes probably seems overkill at the moment, but I was involved with this problem space for quite a while, I had looked at many online crash reports from Dr. Watson back in the day when I was with MS, and I can tell you that these requirement exists, they are valid concerns and when implemented the solution helps tremendously. Ultimately, you can't fix what you cannot measure. A large organisation depends on good management and monitoring of its application stock, including logging and auditing.

There are some third party vendors that offer solutions, some even integrated with log4net, like bugcollect.com (Full disclosure: that's my own company), Error Traffic Controller or Exceptioneer and other.

OTHER TIPS

Logstash + Elasticsearch + Kibana + Redis or RabbitMQ + NLog or Log4net

Storage + Search & Analytics: Elasticsearch
Collecting & Parsing : Logstash
Visualize: Kibana
Queue&Buffer: Redis
In Application: NLog

The 1024 byte Syslog message length limit mentioned so far is misleading and incorrectly biases against Syslog-based solutions to the problem.

The limit for the obsolete "BSD Syslog Protocol" is indeed 1024 bytes.

The BSD syslog Protocol - 4.1 syslog Message Parts

The limit for the modern "Syslog Protocol" is implementation-dependent but MUST be at least 480 bytes, SHOULD be at least 2048 bytes, and MAY be even higher.

The BSD syslog Protocol - 6.1. Message Length

As an example, Rsyslog's configuration setting is called MaxMessageSize, which the documentation suggests can be set at least as high as 64kb.

rsyslog - Configuration Directives

That the asker's organisation is "a Microsoft house" where "UNIX solutions are no good" should not prevent less discriminatory readers from getting accurate information.

SQL would work, but I've used Splunk to aggregate logs. I was able to find some surprising information based on the way Splunk allows you to set up indexes on your data, and then use their query tools to make some nice graphs. You can download a basic version of it for free too.

As the other responses have pointed out, the closest thing to an industry standard is syslog. But don't despair because you're living in a Windows world. ~~Kiwi have a syslog daemaon which runs on Windows, and it is free. Find out more.~~

update
As @MichaelFreidgeim points out, Kiwi now charge for their syslog daemon. However there are other free alternatives available. This other SO answer links to a couple of them.

If you have log4net log to the local EventViewer, you can mine these logs on a Windows 2008 box, see this centralized auditing article.

On that box, you can then easily import these events and provide some management and mining tools on top of it.

As others already pointed out, directing logs from magnitude of apps and hosts directly to the database isn't a good idea. I just wanted to add one more advantage in favor of using dedicated centralized log server - it's decoupling of your apps from the log infrastructure. Since you're in .Net, there are couple of good choices - log4net and NLog. Both are very good products, but I particularly like the NLog, it proved to be much better performer with heavier loads, has much better configuration options and being actively maintained. Log4Net as far as I know hasn't been changed for a few years and have some issues, but still very robust solution as well. So, once you use such framework, you control on app level as to how, what and when it transmits its logs to centralized server. If at all.

Have a look at logFaces which was built specifically for a situations you describe - to aggregate logs from magnitude of apps and hosts providing centralized storage and source for analysis and monitoring. And doing all this not intrusively with zero changes in your existing code base. It will handle massive load of apps and hosts and let you specify what you want to do with the data. On the other hand, you've got very nice GUI for monitoring in real-time or digging into the data. You don't have to deal with databases directly at all. There are many databases to chose from - both SQL and NoSQL. BTW, RDBS are not the best performers with very large data stores. logFaces can work with MongoDB - this setup normally outperforms best traditional RDBS brands ten fold or so. Particularly when used with capped collections.

(for the disclosure, I am the author of logFaces)

If your running on *nix machines, the traditional solution is syslog.

On Unix, there's syslog.
Also, you might want to check out this case study.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow