Pregunta

I try example from "http://thrift-tutorial.readthedocs.org/en/latest/usage-example.html". This example just calculate the product of two numbers. Server: Java, Client: Python.

If I try to get product via thrift in 3000 times, elapsed time is ~4.8s. If I create a simple function (multiply) in python and call it directly in 3000 times, elapsed time is ~0.007s (686x times faster).

So how can I improve the performance? I want to build an application and separate it into some sub-applications. They can be implemented in multiple languages and they will communicate to each other via thrift, but with this poor performance like that should I consider to combine them to sole application?

App-A (Java)                   App-B (Python)
     |                                 |
     |------------ App-C (C++) --------|

or

App-A+C (Java)                   App-B+C (Python)
(implement C in Java)            (implement C in Python)
¿Fue útil?

Solución

Two key optimizations you can set as goals:

  • Send all the data you already have before waiting.
  • Don't send a computed result across the channel if the only thing done with it is to send it straight back.

What you have described in your question is an extreme case of a "chatty protocol". The network has latency (delay). If you wait for each result before starting the next computation, most of the time is spent waiting for the network transfer, not for the actual computation. By sending another computation before receiving the first result, you can improve throughput dramatically.

So the simplest thing is to allow overlapping requests. The product of the second pair of values doesn't depend on the first result, so don't wait for the first result to arrive.

When you are dealing with local IPC, that doesn't help so much. The cost of communication isn't delay, it's message processing and thread synchronization, depending on number of requests but not so much the order.

A bigger change with larger payoff is to make each request represents a complex algorithm. For example, instead of a remote call for a multiply on two numbers, try a remote call for an entire filtering operation, where the arguments are an entire data vector or matrix, and the server will perform FFTs, multiple, inverse FFT, scale, and then pass the result back. This satisfies both the original goals: all available data is sent together, instead of singly, reducing time spend waiting. And total network traffic is reduced because intermediate results don't have to be exchanged.


A final alternative is to link code from all three languages into a single process, so that data access and function calls are direct. Many languages allow building objects that export plain "C" functions and data.

Also, virtual machines such as .NET run intermediate languages that can be generated from compilation of different source languages. With .NET you have C# (Java-like), C++/CLI (supports full C++, plus extensions for working on .NET data), and IronPython, which cover your question diagram. Plus F#, JavaScript, a Ruby variant, and on and on. The Java virtual machine is supposed to be language-specific, but people have written Clojure and other languages that compile to bytecode.

The advantage of the virtual machine technique is that it enables some cross-language optimization (.NET JIT does cross-module inlining). The disadvantage is that your performance is dictated by JIT optimizations, which generally are the lowest common denominator. C++/CLI actually is really good for bridging this gap, because it supports fully-optimized native code (including SIMD), .NET intermediate language (MSIL), and the lowest overhead layer for communicating between them (C++ "It Just Works" interop).

But you could accomplish about the same thing on the Java VM, by using JNI to interface fully-optimized C++ code for intense number crunching using SIMD.

Otros consejos

Your comparison is based on incorrect assumptions. The assumption is, that a cross-process call (at least) is as fast as an in-process call, which is simply not true.

This is one of the famous 8 network fallacies originated by Peter Deutsch, later extended by others that does not only apply to networks, but also to IPC on a single machine: Contrary to what you think, transport cost is NOT zero.

From what I can tell based on your limited information, your 1.5 msec per IPC roundtrip sounds not so bad to me.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top