Question

Is there a memory penalty to passing around large objects (like an entire http request, or the blob contents of an uploaded file) between functions within the same (Erlang) process?

It is my understanding that Erlang passes arguments by value. This would mean passing large objects means their "value" is passed each time a function call is made with that object as an argument.

To make things more concrete, consider in pseudocode two ways of doing the exact same thing.

Method 1

X = <<......>>, %X is a very large binary
Value1 = prelim_fun(X), %Value1 is a smaller object, e.g. a hash
FinalValue = final_fun(Value1).

final_fun(Value1) ->
    FinalValue = further_fun(Value1),
    FinalValue. 

Method 2

 X = <<......>>, %X is a very large binary
 FinalValue = final_fun(X).

 finalfun(X) ->
    Value1 = prelim_fun(X), 
    FinalValue = further_fun(Value1),
    FinalValue. 

In the first case, only the results of a preliminary computation are passed, while in the second the large blob is passed directly to the function. Does the first method have a lighter memory footprint?

This is not about micro optimization. When a system deals in large files, deciding whether to pass the whole file when a simple hash would suffice, could, IF each function call consumes memory in proportion to the arguments, literally make or break the system. Considering practical numbers: if each file blob is an average of 10MB, and each hash is 10KB, the difference could be literally 1000-fold. That is NOT a "micro" optimization.

PS: Since this question seems to have 2 upvotes and 2 downvotes so far, might I request any future downvoters to be kind enough to explain why they think this is such a terrible question for this site?

Was it helpful?

Solution

Since you don't specify which implementation you use, it is hard to say. The Erlang Specification does not guarantee any specific optimizations except Proper Tail Calls. So, you have to assume the worst.

However, if you use either Erjang or BEAM/HiPE, if I remember correctly, they both pass arguments within a process and between processes on the same VM via immutable pointers to shared memory, so you only pay the cost of copying the pointer.

But bear in mind that this is only an optimization performed by some implementations, it is not guaranteed by the spec.

Licensed under: CC-BY-SA with attribution
scroll top