Erlang pattern matching bitstrings

https://stackoverflow.com/questions/5812902

25-10-2019
|

Question

I'm writing code to decode messages from a binary protocol. Each message type is assigned a 1 byte type identifier and each message carries this type id. Messages all start with a common header consisting of 5 fields. My API is simple:

decoder:decode(Bin :: binary()) -> my_message_type() | {error, binary()}`

My first instinct is to lean heavily on pattern matching by writing one decode function for each message type and to decode that message type completely in the fun argument

decode(<<Hdr1:8, ?MESSAGE_TYPE_ID_X:8, Hdr3:8, Hdr4:8, Hdr5:32, 
         TypeXField1:32, TypeXFld2:32, TypeXFld3:32>>) ->
    #message_x{hdr1=Hdr1, hdr3=Hdr3 ... fld4=TypeXFld3};

decode(<<Hdr1:8, ?MESSAGE_TYPE_ID_Y:8, Hdr3:8, Hdr4:8, Hdr5:32, 
         TypeYField1:32, TypeYFld2:16, TypeYFld3:4, TypeYFld4:32
         TypeYFld5:64>>) ->
    #message_y{hdr1=Hdr1, hdr3=Hdr3 ... fld5=TypeYFld5}.

Note that while the first 5 fields of the messages are structurally identical, the fields after that vary for each message type.

I have roughly 20 message types and thus 20 functions similar to the above. Am I decoding the full message multiple times with this structure? Is it idiomatic? Would I be better off just decoding the message type field in the function header and then decode the full message in the body of the message?

Solution

Just to agree that your style is very idiomatic Erlang. Don't split the decoding into separate parts unless you feel it makes your code clearer. Sometimes it can be more logical to do that type of grouping.

The compiler is smart and compiles pattern matching in such a way that it will not decode the message more than once. It will first decode the first two fields (bytes) and then use the value of the second field, the message type, to determine how it is going to handle the rest of the message. This works irrespective of how long the common part of the binary is.

So their is no need to try and "help" the compiler by splitting the decoding into separate parts, it will not make it more efficient. Again, only do it if it makes your code clearer.

OTHER TIPS

Your current approach is idiomatic Erlang, so keep going this direction. Don't worry about performance, Erlang compiler does good work here. If your messages are really exactly same format you can write macro for it but it should generate same code under hood. Anyway using macro usually leads to worse maintainability. Just for curiosity why you are generating different record types when all have exactly same fields? Alternative approach is just translate message type from constant to Erlang atom and store it in one record type.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow