AppDynamics or NewRelic kind of system - how does it work?

Question 1

The way that these products generally work is by doing bytecode injection / function interposition / monkey-patching on commonly used libraries and methods. For instance, you might hook into JDBC query methods, servlet base classes, and HTTP client libraries. When a request enters the application, track all the important methods/calls it makes, and log them in some way. Take the data and crunch it into analytics, charts, and alerts.

On top of that, you can start to add in statistical profiling or other options.

The tricky things are tracking requests across process boundaries and dealing with the volume of performance data you'll gather. (I work on this problem at AppNeta)

One thing to check out is Twitter Zipkin (https://github.com/twitter/zipkin), doesn't support much and pretty early-stage but interesting project.

Question 2

Both AppDynamics and New Relic use Standard BCI to monitor the common interfaces (entry and exit points) developers use to build applications (e.g. Servlet, struts, SOAP, JMS, JDBC, ...). This provides a basic skeleton of code execution (call graphs) with timing information which represents less than 5% of code that is executed.

The secret is to then uncover the timing of the remaining 95% code execution during slowdowns without incurring too much overhead in a production JVM. AppDynamics uses a combination of in-memory agent analytics and Java API calls to then extract the remaining code execution in real-time. This means no custom instrumentation is required or explicit declaration of what classes/methods you want the monitoring solution to instrument.

AppDynamics data collection is very different to that of New Relic. For example, with AppDynamics you can get a complete distributed call graph across multiple JVMs for a specific user request, rather than say an aggregate of requests.

BCI is a commodity these days, the difference is in the analytics and algorithms used by vendors that trigger diagnostics/call graph information so you end up with the right visibility at the right time to solve problems.

Steve.