Z80 ASM BNF结构…我在正确的轨道上吗？

https://stackoverflow.com/questions/1317256

19-09-2019
|

题

我正在尝试学习BNF并尝试组装一些Z80 ASM代码。由于我是两个领域的新手，所以我的问题是，我甚至在正确的道路上吗？我正在尝试编写Z80 ASM为EBNF的格式，以便我可以弄清楚从源来创建机器代码的位置。目前，我有以下内容：

Assignment = Identifier, ":" ;

Instruction = Opcode, [ Operand ], [ Operand ] ;

Operand = Identifier | Something* ;

Something* = "(" , Identifier, ")" ;

Identifier = Alpha, { Numeric | Alpha } ;

Opcode = Alpha, Alpha ;

Int = [ "-" ], Numeric, { Numeric } ;

Alpha = "A" | "B" | "C" | "D" | "E" | "F" | 
        "G" | "H" | "I" | "J" | "K" | "L" | 
        "M" | "N" | "O" | "P" | "Q" | "R" | 
        "S" | "T" | "U" | "V" | "W" | "X" | 
        "Y" | "Z" ;

Numeric = "0" | "1" | "2" | "3"| "4" | 
          "5" | "6" | "7" | "8" | "9" ;

如果我出错的话，任何方向的反馈都将是很好的。

解决方案

通常在汇编器中手工编码老式汇编器，并使用ADHOC解析技术来处理组装源线以生成实际的汇编代码。当汇编器语法很简单（例如Opcode Reg，操作数）时，这足够好。

现代机器具有凌乱，令人讨厌的教学集，并带有许多指令变化和操作数，可以用复杂的语法表达，允许多个索引寄存器参与操作数表达式。允许使用固定和可重新定位的常数具有各种类型的加法运算符的复杂的组装时间表达式使它变得复杂。复杂的汇编器允许有条件的汇编，宏，结构化数据声明等。都增加了对语法的新需求。通过临时方法处理所有这些语法非常困难，这是发明解析器的原因。

使用BNF和Parser Generator是建造现代汇编程序的非常合理的方法，即使对于诸如Z80之类的旧处理器也是如此。我已经为摩托罗拉8个机器（例如6800/6809）建造了这样的汇编器，并准备为现代X86做同样的事情。我认为您正是正确的道路。

**********编辑**********************案以Lexer和Parser定义要求。我在这里提供了两者。

这些是6809个混合物的实际规格摘录。完整的定义是这里样品大小的2-3x。

为了保持空间，我已经编辑了许多深色角色复杂性，这是这些定义的重点。一个明显的复杂性可能会感到沮丧。关键是，有了这样的定义，您正在尝试描述语言的形状，而不是按照程序的编码。如果您以临时的方式进行编码，那么您将支付更高的复杂性，并且可维护效果要差得多。

知道这些定义与具有Lexing/解析工具作为子系统的高端程序分析系统一起使用，也将有所帮助DMS软件重新设计工具包. 。 DMS将自动从
语法规则在解析器规格中，这使得buid解析工具变得更加容易。最后，解析器规范包含所谓的“ PrettyPrinter”声明，该声明允许DMS从ASTS中再生源文本。（Grammer的真正目的是允许我们构建代表汇编指令的AST，然后将它们吐出以将其馈送到真实的汇编器中！）

值得注意的一件事：如何说明词汇和语法规则（Metasyntxax！）在不同的Lexer/Parser Generator系统之间有所不同。基于DMS的规范的语法也不例外。 DMS具有相对复杂的语法规则，在此处可用的空间中解释确实是不切实际的。您必须了解其他系统使用类似的符号，用于eBNF的规则和lexemes的正则表达式变体。

鉴于OP的兴趣，他可以使用任何Lexer/Parser Generator工具，例如Flex/Yacc，Javacc，Antlr，...

********** lexer **************

-- M6809.lex: Lexical Description for M6809
-- Copyright (C) 1989,1999-2002 Ira D. Baxter

%%
#mainmode Label

#macro digit "[0-9]"
#macro hexadecimaldigit "<digit>|[a-fA-F]"

#macro comment_body_character "[\u0009 \u0020-\u007E]" -- does not include NEWLINE

#macro blank "[\u0000 \ \u0009]"

#macro hblanks "<blank>+"

#macro newline "\u000d \u000a? \u000c? | \u000a \u000c?" -- form feed allowed only after newline

#macro bare_semicolon_comment "\; <comment_body_character>* "

#macro bare_asterisk_comment "\* <comment_body_character>* "

...[snip]

#macro hexadecimal_digit "<digit> | [a-fA-F]"

#macro binary_digit "[01]"

#macro squoted_character "\' [\u0021-\u007E]"

#macro string_character "[\u0009 \u0020-\u007E]"

%%Label -- (First mode) processes left hand side of line: labels, opcodes, etc.

#skip "(<blank>*<newline>)+"
#skip "(<blank>*<newline>)*<blank>+"
  << (GotoOpcodeField ?) >>

#precomment "<comment_line><newline>"

#preskip "(<blank>*<newline>)+"
#preskip "(<blank>*<newline>)*<blank>+"
  << (GotoOpcodeField ?) >>

-- Note that an apparant register name is accepted as a label in this mode
#token LABEL [STRING] "<identifier>"
  <<  (local (;; (= [TokenScan natural] 1) ; process all string characters
         (= [TokenLength natural] ?:TokenCharacterCount)=
         (= [TokenString (reference TokenBodyT)] (. ?:TokenCharacters))
         (= [Result (reference string)] (. ?:Lexeme:Literal:String:Value))
         [ThisCharacterCode natural]
         (define Ordinala #61)
         (define Ordinalf #66)
         (define OrdinalA #41)
         (define OrdinalF #46)
     );;
     (;; (= (@ Result) `') ; start with empty string
     (while (<= TokenScan TokenLength)
      (;;   (= ThisCharacterCode (coerce natural TokenString:TokenScan))  
        (+= TokenScan) ; bump past character
        (ifthen (>= ThisCharacterCode Ordinala)
           (-= ThisCharacterCode #20) ; fold to upper case
        )ifthen
        (= (@ Result) (append (@ Result) (coerce character ThisCharacterCode)))=

        );;
     )while
     );;
  )local
  (= ?:Lexeme:Literal:String:Format (LiteralFormat:MakeCompactStringLiteralFormat 0))  ; nothing interesting in string
  (GotoLabelList ?)
  >>

%%OpcodeField

#skip "<hblanks>"
  << (GotoEOLComment ?) >>
#ifnotoken
  << (GotoEOLComment ?) >>

-- Opcode field tokens
#token 'ABA'       "[aA][bB][aA]"
   << (GotoEOLComment ?) >>
#token 'ABX'       "[aA][bB][xX]"
   << (GotoEOLComment ?) >>
#token 'ADC'       "[aA][dD][cC]"
   << (GotoABregister ?) >>
#token 'ADCA'      "[aA][dD][cC][aA]"
   << (GotoOperand ?) >>
#token 'ADCB'      "[aA][dD][cC][bB]"
   << (GotoOperand ?) >>
#token 'ADCD'      "[aA][dD][cC][dD]"
   << (GotoOperand ?) >>
#token 'ADD'       "[aA][dD][dD]"
   << (GotoABregister ?) >>
#token 'ADDA'      "[aA][dD][dD][aA]"
   << (GotoOperand ?) >>
#token 'ADDB'      "[aA][dD][dD][bB]"
   << (GotoOperand ?) >>
#token 'ADDD'      "[aA][dD][dD][dD]"
   << (GotoOperand ?) >>
#token 'AND'       "[aA][nN][dD]"
   << (GotoABregister ?) >>
#token 'ANDA'      "[aA][nN][dD][aA]"
   << (GotoOperand ?) >>
#token 'ANDB'      "[aA][nN][dD][bB]"
   << (GotoOperand ?) >>
#token 'ANDCC'     "[aA][nN][dD][cC][cC]"
   << (GotoRegister ?) >>
...[long list of opcodes snipped]

#token IDENTIFIER [STRING] "<identifier>"
  <<  (local (;; (= [TokenScan natural] 1) ; process all string characters
         (= [TokenLength natural] ?:TokenCharacterCount)=
         (= [TokenString (reference TokenBodyT)] (. ?:TokenCharacters))
         (= [Result (reference string)] (. ?:Lexeme:Literal:String:Value))
         [ThisCharacterCode natural]
         (define Ordinala #61)
         (define Ordinalf #66)
         (define OrdinalA #41)
         (define OrdinalF #46)
     );;
     (;; (= (@ Result) `') ; start with empty string
     (while (<= TokenScan TokenLength)
      (;;   (= ThisCharacterCode (coerce natural TokenString:TokenScan))  
        (+= TokenScan) ; bump past character
        (ifthen (>= ThisCharacterCode Ordinala)
           (-= ThisCharacterCode #20) ; fold to upper case
        )ifthen
        (= (@ Result) (append (@ Result) (coerce character ThisCharacterCode)))=

        );;
     )while
     );;
  )local
  (= ?:Lexeme:Literal:String:Format (LiteralFormat:MakeCompactStringLiteralFormat 0))  ; nothing interesting in string
  (GotoOperandField ?)
  >>

#token '#'   "\#" -- special constant introduction (FDB)
   << (GotoDataField ?) >>

#token NUMBER [NATURAL] "<decimal_number>"
  << (local [format LiteralFormat:NaturalLiteralFormat]
    (;; (= ?:Lexeme:Literal:Natural:Value (ConvertDecimalTokenStringToNatural (. format) ? 0 0))
    (= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format))
    );;
 )local
 (GotoOperandField ?)
  >>

#token NUMBER [NATURAL] "\$ <hexadecimal_digit>+"
  << (local [format LiteralFormat:NaturalLiteralFormat]
    (;; (= ?:Lexeme:Literal:Natural:Value (ConvertHexadecimalTokenStringToNatural (. format) ? 1 0))
    (= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format))
    );;
 )local
 (GotoOperandField ?)
  >>

#token NUMBER [NATURAL] "\% <binary_digit>+"
  << (local [format LiteralFormat:NaturalLiteralFormat]
    (;; (= ?:Lexeme:Literal:Natural:Value (ConvertBinaryTokenStringToNatural (. format) ? 1 0))
    (= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format))
    );;
 )local
 (GotoOperandField ?)
  >>

#token CHARACTER [CHARACTER] "<squoted_character>"
  <<  (= ?:Lexeme:Literal:Character:Value (TokenStringCharacter ? 2))
  (= ?:Lexeme:Literal:Character:Format (LiteralFormat:MakeCompactCharacterLiteralFormat 0 0)) ; nothing special about character
  (GotoOperandField ?)
  >>


%%OperandField

#skip "<hblanks>"
  << (GotoEOLComment ?) >>
#ifnotoken
  << (GotoEOLComment ?) >>

-- Tokens signalling switch to index register modes
#token ','   "\,"
   <<(GotoRegisterField ?)>>
#token '['   "\["
   <<(GotoRegisterField ?)>>

-- Operators for arithmetic syntax
#token '!!'  "\!\!"
#token '!'   "\!"
#token '##'  "\#\#"
#token '#'   "\#"
#token '&'   "\&"
#token '('   "\("
#token ')'   "\)"
#token '*'   "\*"
#token '+'   "\+"
#token '-'   "\-"
#token '/'   "\/"
#token '//'   "\/\/"
#token '<'   "\<"
#token '<'   "\<" 
#token '<<'  "\<\<"
#token '<='  "\<\="
#token '</'  "\<\/"
#token '='   "\="
#token '>'   "\>"
#token '>'   "\>"
#token '>='  "\>\="
#token '>>'  "\>\>"
#token '>/'  "\>\/"
#token '\\'  "\\"
#token '|'   "\|"
#token '||'  "\|\|"

#token NUMBER [NATURAL] "<decimal_number>"
  << (local [format LiteralFormat:NaturalLiteralFormat]
    (;; (= ?:Lexeme:Literal:Natural:Value (ConvertDecimalTokenStringToNatural (. format) ? 0 0))
    (= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format))
    );;
 )local
  >>

#token NUMBER [NATURAL] "\$ <hexadecimal_digit>+"
  << (local [format LiteralFormat:NaturalLiteralFormat]
    (;; (= ?:Lexeme:Literal:Natural:Value (ConvertHexadecimalTokenStringToNatural (. format) ? 1 0))
    (= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format))
    );;
 )local
  >>

#token NUMBER [NATURAL] "\% <binary_digit>+"
  << (local [format LiteralFormat:NaturalLiteralFormat]
    (;; (= ?:Lexeme:Literal:Natural:Value (ConvertBinaryTokenStringToNatural (. format) ? 1 0))
    (= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format))
    );;
 )local
  >>

-- Notice that an apparent register is accepted as a label in this mode
#token IDENTIFIER [STRING] "<identifier>"
  <<  (local (;; (= [TokenScan natural] 1) ; process all string characters
         (= [TokenLength natural] ?:TokenCharacterCount)=
         (= [TokenString (reference TokenBodyT)] (. ?:TokenCharacters))
         (= [Result (reference string)] (. ?:Lexeme:Literal:String:Value))
         [ThisCharacterCode natural]
         (define Ordinala #61)
         (define Ordinalf #66)
         (define OrdinalA #41)
         (define OrdinalF #46)
     );;
     (;; (= (@ Result) `') ; start with empty string
     (while (<= TokenScan TokenLength)
      (;;   (= ThisCharacterCode (coerce natural TokenString:TokenScan))  
        (+= TokenScan) ; bump past character
        (ifthen (>= ThisCharacterCode Ordinala)
           (-= ThisCharacterCode #20) ; fold to upper case
        )ifthen
        (= (@ Result) (append (@ Result) (coerce character ThisCharacterCode)))=

        );;
     )while
     );;
  )local
  (= ?:Lexeme:Literal:String:Format (LiteralFormat:MakeCompactStringLiteralFormat 0))  ; nothing interesting in string
  >>

%%Register -- operand field for TFR, ANDCC, ORCC, EXG opcodes

#skip "<hblanks>"
#ifnotoken << (GotoRegisterField ?) >>

%%RegisterField -- handles registers and indexing mode syntax
-- In this mode, names that look like registers are recognized as registers

#skip "<hblanks>"
  << (GotoEOLComment ?) >>
#ifnotoken
  << (GotoEOLComment ?) >>

#token '['   "\["
#token ']'   "\]"
#token '--'  "\-\-"
#token '++'  "\+\+"

#token 'A'      "[aA]"
#token 'B'      "[bB]"
#token 'CC'     "[cC][cC]"
#token 'DP'     "[dD][pP] | [dD][pP][rR]" -- DPR shouldnt be needed, but found one instance
#token 'D'      "[dD]"
#token 'Z'      "[zZ]"

-- Index register designations
#token 'X'      "[xX]"
#token 'Y'      "[yY]"
#token 'U'      "[uU]"
#token 'S'      "[sS]"
#token 'PCR'    "[pP][cC][rR]"
#token 'PC'     "[pP][cC]"

#token ','    "\,"

-- Operators for arithmetic syntax
#token '!!'  "\!\!"
#token '!'   "\!"
#token '##'  "\#\#"
#token '#'   "\#"
#token '&'   "\&"
#token '('   "\("
#token ')'   "\)"
#token '*'   "\*"
#token '+'   "\+"
#token '-'   "\-"
#token '/'   "\/"
#token '<'   "\<"
#token '<'   "\<" 
#token '<<'  "\<\<"
#token '<='  "\<\="
#token '<|'  "\<\|"
#token '='   "\="
#token '>'   "\>"
#token '>'   "\>"
#token '>='  "\>\="
#token '>>'  "\>\>"
#token '>|'  "\>\|"
#token '\\'  "\\"
#token '|'   "\|"
#token '||'  "\|\|"

#token NUMBER [NATURAL] "<decimal_number>"
  << (local [format LiteralFormat:NaturalLiteralFormat]
    (;; (= ?:Lexeme:Literal:Natural:Value (ConvertDecimalTokenStringToNatural (. format) ? 0 0))
    (= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format))
    );;
 )local
  >>

... [snip]

%% -- end M6809.lex

****************************************

-- M6809.ATG: Motorola 6809 assembly code parser
-- (C) Copyright 1989;1999-2002 Ira D. Baxter; All Rights Reserved

m6809 = sourcelines ;

sourcelines = ;
sourcelines = sourcelines sourceline EOL ;
  <<PrettyPrinter>>: { V(CV(sourcelines[1]),H(sourceline,A<eol>(EOL))); }

-- leading opcode field symbol should be treated as keyword.

sourceline = ;
sourceline = labels ;
sourceline = optional_labels 'EQU' expression ;
  <<PrettyPrinter>>: { H(optional_labels,A<opcode>('EQU'),A<operand>(expression)); }
sourceline = LABEL 'SET' expression ;
  <<PrettyPrinter>>: { H(A<firstlabel>(LABEL),A<opcode>('SET'),A<operand>(expression)); }
sourceline = optional_label instruction ;
  <<PrettyPrinter>>: { H(optional_label,instruction); }
sourceline = optional_label optlabelleddirective ;
  <<PrettyPrinter>>: { H(optional_label,optlabelleddirective); }
sourceline = optional_label implicitdatadirective ;
  <<PrettyPrinter>>: { H(optional_label,implicitdatadirective); }
sourceline = unlabelleddirective ;
sourceline = '?ERROR' ;
  <<PrettyPrinter>>: { A<opcode>('?ERROR'); }

optional_label = labels ;
optional_label = LABEL ':' ;
  <<PrettyPrinter>>: { H(A<firstlabel>(LABEL),':'); }
optional_label = ;

optional_labels = ;
optional_labels = labels ;
labels = LABEL ;
  <<PrettyPrinter>>: { A<firstlabel>(LABEL); }
labels = labels ',' LABEL ;
  <<PrettyPrinter>>: { H(labels[1],',',A<otherlabels>(LABEL)); }

unlabelleddirective = 'END' ;
  <<PrettyPrinter>>: { A<opcode>('END'); }
unlabelleddirective = 'END' expression ;
  <<PrettyPrinter>>: { H(A<opcode>('END'),A<operand>(expression)); }
unlabelleddirective = 'IF' expression EOL conditional ;
  <<PrettyPrinter>>: { V(H(A<opcode>('IF'),H(A<operand>(expression),A<eol>(EOL))),CV(conditional)); }
unlabelleddirective = 'IFDEF' IDENTIFIER EOL conditional ;
  <<PrettyPrinter>>: { V(H(A<opcode>('IFDEF'),H(A<operand>(IDENTIFIER),A<eol>(EOL))),CV(conditional)); }
unlabelleddirective = 'IFUND' IDENTIFIER EOL conditional ;
  <<PrettyPrinter>>: { V(H(A<opcode>('IFUND'),H(A<operand>(IDENTIFIER),A<eol>(EOL))),CV(conditional)); }
unlabelleddirective = 'INCLUDE' FILENAME ;
  <<PrettyPrinter>>: { H(A<opcode>('INCLUDE'),A<operand>(FILENAME)); }
unlabelleddirective = 'LIST' expression ;
  <<PrettyPrinter>>: { H(A<opcode>('LIST'),A<operand>(expression)); }
unlabelleddirective = 'NAME' IDENTIFIER ;
  <<PrettyPrinter>>: { H(A<opcode>('NAME'),A<operand>(IDENTIFIER)); }
unlabelleddirective = 'ORG' expression ;
  <<PrettyPrinter>>: { H(A<opcode>('ORG'),A<operand>(expression)); }
unlabelleddirective = 'PAGE' ;
  <<PrettyPrinter>>: { A<opcode>('PAGE'); }
unlabelleddirective = 'PAGE' HEADING ;
  <<PrettyPrinter>>: { H(A<opcode>('PAGE'),A<operand>(HEADING)); }
unlabelleddirective = 'PCA' expression ;
  <<PrettyPrinter>>: { H(A<opcode>('PCA'),A<operand>(expression)); }
unlabelleddirective = 'PCC' expression ;
  <<PrettyPrinter>>: { H(A<opcode>('PCC'),A<operand>(expression)); }
unlabelleddirective = 'PSR' expression ;
  <<PrettyPrinter>>: { H(A<opcode>('PSR'),A<operand>(expression)); }
unlabelleddirective = 'TABS' numberlist ;
  <<PrettyPrinter>>: { H(A<opcode>('TABS'),A<operand>(numberlist)); }
unlabelleddirective = 'TITLE' HEADING ;
  <<PrettyPrinter>>: { H(A<opcode>('TITLE'),A<operand>(HEADING)); }
unlabelleddirective = 'WITH' settings ;
  <<PrettyPrinter>>: { H(A<opcode>('WITH'),A<operand>(settings)); }

settings = setting ;
settings = settings ',' setting ;
  <<PrettyPrinter>>: { H*; }
setting = 'WI' '=' NUMBER ;
  <<PrettyPrinter>>: { H*; }
setting = 'DE' '=' NUMBER ;
  <<PrettyPrinter>>: { H*; }
setting = 'M6800' ;
setting = 'M6801' ;
setting = 'M6809' ;
setting = 'M6811' ;

-- collects lines of conditional code into blocks
conditional = 'ELSEIF' expression EOL conditional ;
  <<PrettyPrinter>>: { V(H(A<opcode>('ELSEIF'),H(A<operand>(expression),A<eol>(EOL))),CV(conditional[1])); }
conditional = 'ELSE' EOL else ;
  <<PrettyPrinter>>: { V(H(A<opcode>('ELSE'),A<eol>(EOL)),CV(else)); }
conditional = 'FIN' ;
  <<PrettyPrinter>>: { A<opcode>('FIN'); }
conditional = sourceline EOL conditional ;
  <<PrettyPrinter>>: { V(H(sourceline,A<eol>(EOL)),CV(conditional[1])); }

else = 'FIN' ;
  <<PrettyPrinter>>: { A<opcode>('FIN'); }
else = sourceline EOL else ;
  <<PrettyPrinter>>: { V(H(sourceline,A<eol>(EOL)),CV(else[1])); }

-- keyword-less directive, generates data tables

implicitdatadirective = implicitdatadirective ',' implicitdataitem ;
  <<PrettyPrinter>>: { H*; }
implicitdatadirective = implicitdataitem ;

implicitdataitem = '#' expression ;
  <<PrettyPrinter>>: { A<operand>(H('#',expression)); }
implicitdataitem = '+' expression ;
  <<PrettyPrinter>>: { A<operand>(H('+',expression)); }
implicitdataitem = '-' expression ;
  <<PrettyPrinter>>: { A<operand>(H('-',expression)); }
implicitdataitem = expression ;
  <<PrettyPrinter>>: { A<operand>(expression); }
implicitdataitem = STRING ;
  <<PrettyPrinter>>: { A<operand>(STRING); }

-- instructions valid for m680C (see Software Dynamics ASM manual)
instruction = 'ABA' ;
  <<PrettyPrinter>>: { A<opcode>('ABA'); }
instruction = 'ABX' ;
  <<PrettyPrinter>>: { A<opcode>('ABX'); }

instruction = 'ADC' 'A' operandfetch ;
  <<PrettyPrinter>>: { H(A<opcode>(H('ADC','A')),A<operand>(operandfetch)); }
instruction = 'ADC' 'B' operandfetch ;
  <<PrettyPrinter>>: { H(A<opcode>(H('ADC','B')),A<operand>(operandfetch)); }
instruction = 'ADCA' operandfetch ;
  <<PrettyPrinter>>: { H(A<opcode>('ADCA'),A<operand>(operandfetch)); }
instruction = 'ADCB' operandfetch ;
  <<PrettyPrinter>>: { H(A<opcode>('ADCB'),A<operand>(operandfetch)); }
instruction = 'ADCD' operandfetch ;
  <<PrettyPrinter>>: { H(A<opcode>('ADCD'),A<operand>(operandfetch)); }

instruction = 'ADD' 'A' operandfetch ;
  <<PrettyPrinter>>: { H(A<opcode>(H('ADD','A')),A<operand>(operandfetch)); }
instruction = 'ADD' 'B' operandfetch ;
  <<PrettyPrinter>>: { H(A<opcode>(H('ADD','B')),A<operand>(operandfetch)); }
instruction = 'ADDA' operandfetch ;
  <<PrettyPrinter>>: { H(A<opcode>('ADDA'),A<operand>(operandfetch)); }

[..snip...]

-- condition code mask for ANDCC and ORCC
conditionmask = '#' expression ;
  <<PrettyPrinter>>: { H*; }
conditionmask = expression ;

target = expression ;

operandfetch = '#' expression ; --immediate
  <<PrettyPrinter>>: { H*; }

operandfetch = memoryreference ;

operandstore = memoryreference ;

memoryreference = '[' indexedreference ']' ;
  <<PrettyPrinter>>: { H*; }
memoryreference = indexedreference ;

indexedreference = offset ;
indexedreference = offset ',' indexregister ;
  <<PrettyPrinter>>: { H*; }
indexedreference = ',' indexregister ;
  <<PrettyPrinter>>: { H*; }
indexedreference = ',' '--' indexregister ;
  <<PrettyPrinter>>: { H*; }
indexedreference = ',' '-' indexregister ;
  <<PrettyPrinter>>: { H*; }
indexedreference = ',' indexregister '++' ;
  <<PrettyPrinter>>: { H*; }
indexedreference = ',' indexregister '+' ;
  <<PrettyPrinter>>: { H*; }

offset = '>' expression ; -- page zero ref
  <<PrettyPrinter>>: { H*; }
offset = '<' expression ; -- long reference
  <<PrettyPrinter>>: { H*; }
offset = expression ;
offset = 'A' ;
offset = 'B' ;
offset = 'D' ;

registerlist = registername ;
registerlist = registerlist ',' registername ;
  <<PrettyPrinter>>: { H*; }

registername = 'A' ;
registername = 'B' ;
registername = 'CC' ;
registername = 'DP' ;
registername = 'D' ;
registername = 'Z' ;
registername = indexregister ;

indexregister = 'X' ;
indexregister = 'Y' ;
indexregister = 'U' ;  -- not legal on M6811
indexregister = 'S' ;
indexregister = 'PCR' ;
indexregister = 'PC' ;

expression = sum '=' sum ;
  <<PrettyPrinter>>: { H*; }
expression = sum '<<' sum ;
  <<PrettyPrinter>>: { H*; }
expression = sum '</' sum ;
  <<PrettyPrinter>>: { H*; }
expression = sum '<=' sum ;
  <<PrettyPrinter>>: { H*; }
expression = sum '<' sum ;
  <<PrettyPrinter>>: { H*; }
expression = sum '>>' sum ;
  <<PrettyPrinter>>: { H*; }
expression = sum '>/' sum ;
  <<PrettyPrinter>>: { H*; }
expression = sum '>=' sum ;
  <<PrettyPrinter>>: { H*; }
expression = sum '>' sum ;
  <<PrettyPrinter>>: { H*; }
expression = sum '#' sum ;
  <<PrettyPrinter>>: { H*; }
expression = sum ;

sum = product ;
sum = sum '+' product ;
  <<PrettyPrinter>>: { H*; }
sum = sum '-' product ;
  <<PrettyPrinter>>: { H*; }
sum = sum '!' product ;
  <<PrettyPrinter>>: { H*; }
sum = sum '!!' product ;
  <<PrettyPrinter>>: { H*; }

product = term '*' product ;
  <<PrettyPrinter>>: { H*; }
product = term '||' product ; -- wrong?
  <<PrettyPrinter>>: { H*; }
product = term '/' product ;
  <<PrettyPrinter>>: { H*; }
product = term '//' product ;
  <<PrettyPrinter>>: { H*; }
product = term '&' product ;
  <<PrettyPrinter>>: { H*; }
product = term '##' product ;
  <<PrettyPrinter>>: { H*; }
product = term ;

term = '+' term ;
  <<PrettyPrinter>>: { H*; }
term = '-' term ; 
  <<PrettyPrinter>>: { H*; }
term = '\\' term ; -- complement
  <<PrettyPrinter>>: { H*; }
term = '&' term ; -- not

term = IDENTIFIER ;
term = NUMBER ;
term = CHARACTER ;
term = '*' ;
term = '(' expression ')' ;
  <<PrettyPrinter>>: { H*; }

numberlist = NUMBER ;
numberlist = numberlist ',' NUMBER ;
  <<PrettyPrinter>>: { H*; }

其他提示

BNF通常用于诸如Pascal，C ++等结构化的嵌套语言，或者实际上来自Algol家族（包括C＃等现代语言）的任何东西。如果我正在实现汇编器，则可能会使用一些简单的正则表达式来对操作码和操作数进行模式。自从我使用Z80汇编语言以来已经有一段时间了，但是您可能会使用类似的内容：

/\s*(\w{2,3})\s+((\w+)(,\w+)?)?/

这将匹配任何由两个或三个字母的操作码组成的线，然后是一个或两个由逗号分隔的操作数。提取这样的汇编线后，您将查看OpCode并为指令生成正确的字节，包括如果适用操作数的值。

我上面使用正则表达式概述的解析器类型将称为“临时”解析器，这实际上意味着您以某种块为基础（在汇编语言的情况下，通过文本行）拆分并检查输入。

我认为您不需要过度思考。当您只需将整个内容（Modulo Case and Whitespace）将“ LD A，A”分解为负载操作，目的地和源寄存器中，将“ LD A，A”分开的解析器毫无意义。

没有那么多的opcodes，也没有以这样的方式安排它们，您真的可以从解析和理解汇编器IMO中获得很多好处。显然，您需要一个字节/地址/索引参数的解析器，但除此之外，我只有一对一的查找。

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow