Pergunta

I've just started my first proper internship in industry (not learning to code but learning to write software that does stuff). My employer makes use of Apache Spark, as they do a lot of Big Data processing, which exposes an API in four(+) languages, Python, Java, Scala, Spark SQL (and R).

In the toy programming I've done before, there is a true sense in which one language is more inherently "performant" than another, i.e. writing project Euler problems in C++ will probably run faster than a similar implementation in Python, even if there are libraries that might close that gap somewhat (i.e. NumPy).

My employer uses Spark via the PySpark API, and I was wondering how that impacts the performance of their applications versus using Scala or Java. Similarly, Tensorflow exposes Python and C++ APIs, which I would again see having the potential for a performance difference.

My gut reaction is that no, the choice of language doesn't matter, because under the hood the Python API is basically just a wrapper around the same binaries as the Scala or Java wrappers, and that the code you're writing is mostly 'glue' around the functionality exposed by something like Spark or Tensorflow, and the compute requirements of the calls you're making to the API are orders of magnitude larger than that required by the rest of the application.

Foi útil?

Solução

the compute requirements of the calls you're making to the API are orders of magnitude larger than that required by the rest of the application.

For all the stuff you're talking about, this is almost certainly the case. It doesn't matter if the calls in your language binding take 1 μs, 1 ms or even 1 s if the actual processing inside the guts of Spark takes 10 minutes.

Where the language binding can have an effect is if you're dealing with large amounts of data - for example, for loading the initial "big data" set into Spark, if you send every single value through Python's (relatively) really quite slow number code, you may see a significant difference as opposed to doing it in Scala or whatever else.

Licenciado em: CC-BY-SA com atribuição
scroll top