Question

I'm trying to write a compiler to take an assembler file which will output raw machine code instructions.

I've found lots of tutorials on how to write a compiler, but I'm wondering if all the stages are relevant to assembler mnemonics. For instance, is lexical analysis necessary at all given the simplified stage-by-stage format of assembler, or will it still be necessary but in a simpler format?

Was it helpful?

Solution

A lexical analyzer is still required: you must have something that will break the text into individual tokens (words, numbers, punctuation, etc.). You still need a parser, too, although a much simplified one. There is a grammar, after all.

OTHER TIPS

As I see it, the lexical analysis is all that is needed, where the need of a parser is lessened because of the flat structure of assembly.

At first I would check if there aren't invalid instructions/operands, then if all variables used are declared. Once you are sure that the file is a valid program delete the comments and replace variables and procedures with addresses (you have to assign addresses to labels "on the fly" during the translation because you can't know the address right now.). Last do the actual conversion in binary code.

If you assume that every instruction has its own line it will be much easier: if the current line is a label then replace all further references to that with current address, else delete all spaces, leaving one between the two "words" (instruction and operands). Now process the instruction is a joke. ;)

I'd say you could apply almost every stage of compilers to assemblers, of course what applies to you depends on what your going to do. If your making a 1-to-1 mapping you need syntactical analysis to check for errors and a lexer and/or parser to process the text for specifiers to the assembly, such as sectioning, memory protection on .data (or even macros!). There is also size 'optimization' that can be a applied by funneling immediate constants into the smallest size possible. Of course you can go all out and perform deep analysis to do instruction reordering and fusing. You'd might also want a static analysis stage to check for invalid(illegal) sequences(LOCK CMPXCHG EDX,EDX would be an example of syntactical correct but invalid assembly iirc)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top