How to parse a simple custom syntax in Go?

https://softwareengineering.stackexchange.com/questions/341090

06-01-2021
|

Question

I have a limited amount of input types:

34:56 = sensorA#, sensorA#, sensorB#
2:5 = { led# }
66 = otherSensor
2,3,4,5 = greenRelay#, redRelay#, relayA#, relayA#

a:b implies range.
{name} implies a global name for the dataset.
# represents automatic enumeration in the name (not relevant for the question)
Single values or coma separated values means what you'd expect for it.
Less names than values implies automatic name assignation (not relevant for the question)

I need to extract the numerical values from the left side of the expression and the names from the right side so I can iterate to assign the names to the values. I don't know hot to handle this task, I've been reading and I have sought a solution but I'd like to reach a good methodology for this case.

Should I replace all the spaces and tabs before processing?

Should I use regex just to verify the correctness of the input or for something more?

Should I use just plain string manipulation? I'm using Golang and strings are immutable, string manipulation implies allocations and a lot of code (speed is not REALLY important here but I'd like to find the correct way to solve this).

Should I write a lexer and parser for this?

Solution

Should I replace all the spaces and tabs before processing?

You can do this if you want whitespace to be as meaningless as it is in c, c++, java, c#.

This means doing a double pass over the file. For very large files this can be prohibitive because it forces you to hold the whole thing in memory or create a temp file. There are techniques to consume whitespace on the fly. Consider them before you resort to this.

Should I use regex just to verify the correctness of the input or for something more?

Not every language can be validated with regex. Be sure of which category you're in before you commit to it.

Should I use just plain string manipulation? I'm using Golang and strings are immutable, string manipulation implies allocations and a lot of code (speed is not REALLY important here but I'd like to find the correct way to solve this).

"A lot of code" is not a good way to define a language. Here's a good way:

http://www.bottlecaps.de/rr/ui

Should I write a lexer and parser for this?

This offers the most power of anything you've mentioned. There are likely simpler alternatives that center around reusing parsers written for things like json or xml but then you're just shoving your input types into a different data format.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange