Which approach for integrating Python code into a mainly Scala application did you use a second time because it was successful? [closed]

https://softwareengineering.stackexchange.com/questions/362865

24-01-2021
|

Question

I have a mainly Scala application and I am interested in approaches to integrating Python code into this application in a way that is proven by you personally to be successful.

In this context

integrating means allowing Scala code to call Python code, somehow, and use the results or access the exception
successful means the approach was used more than once because it allows the two languages to work together to deliver business value and the approach was used a second time by the same team

Hopefully this question admits factual answers rather than discussion because I am asking for factual observations. However if this is too subjective then as you close this could you please suggest one of,

How I can ask this question in a non-subjective way OR
Point me to a site where I can ask this type of question.

Thanks.

Solution

Capture results via a variable and capture exceptions via STDERR as follows:

in Scala:

import sys.process._
def callPython(): Unit = {
    val result = "python /fullpath/mypythonprogram.py" ! ProcessLogger(stdout append _, stderr append _)
    println(result)
    println("stdout: " + stdout)
    println("stderr: " + stderr)
}

and in Python:

try:
    throws()
    return 0
except Exception as err:
    sys.stderr.write(f'Exception: {err}')
    return 1

For further review see the package process here. Also visit ProcessBuilder here and ProcessLogger here

OTHER TIPS

There is not going to be a single answer to this because each case is different and all approaches has been used successfully by different teams.

In general there are three approaches:

If both languages has the same runtime, you can compile both languages to that runtime and use a foreign function interface. In this scenario, you use Jython and Scala, which both runs on JVM. This is generally the fastest and least overhead, but you'll have to deal with some impedance in the way each language treats objects in their language, and you don't have any isolation so poorly written code in either language can crash the other. Additionally, it can be cumbersome to scale this to multiple machines.
You can spawn subprocesses when processing each request in the main app. The main process communicate with the subprocess by streaming data using stdin and stdout, and possibly other pipes or any other OS specific IPC. This is generally best if the subprocesses are a filter type of program that can be used in a pipeline. There's going to be some overhead in creating subprocesses for each main request, but if you're doing this on a Unix-based system like Linux, creating new process is really fast as the system is optimized for it.
You can create an use inter process communication and communicate with a microservice using a message passing API. Example of this is to run a microservice running an HTTP application server or communicating with domain sockets. With this approach, you have some overhead due to rendering, copying the messages, and parsing the messages, so it's best used when the messages are coarse-grained rather than making lots of small calls. You will need to design an explicit API, but this approach is usually simpler in the long run and can be made more robust since crashes are separate and won't affect the other processes. This approach is also much more easy to scale as having an explicit API makes it more straight forward if you ever need to run the processes on different machines.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange