What is meant by a token in the context of rail-road diagrams?

https://stackoverflow.com/questions/17793947

03-06-2022
|

質問

From Douglas Crockford's JavaScript: The Good Parts, Chapter 2 Grammar

This chapter introduces the grammar of the good parts of JavaScript, presenting a quick overview of how the language is structured. We will represent the grammar with railroad diagrams.

The rules for interpreting these diagrams are simple:

You start on the left edge and follow the tracks to the right edge.

As you go, you will encounter literals in ovals, and rules or descriptions in rectangles.

Any sequence that can be made by following the tracks is legal.

Any sequence that cannot be made by following the tracks is not legal.

Railroad diagrams with one bar at each end allow whitespace to be inserted between any pair of tokens. Railroad diagrams with two bars at each end do not.

The grammar of the good parts presented in this chapter is significantly simpler than the grammar of the whole language.

I have seen this answer on SO which basically reiterates what is presented in the book. So what is meant by token here?

解決

Tokens are the basic atomic units of a grammar. In a typical programming language, tokens would include things like algebraic operators (+, *), statement separators ((, {, ;), identifiers, numeric and string values, and reserved words.

The concept of a "token" is somewhat bound up with the way a grammar is written and parsed. Some parsing schemes don't involve the concept of tokenization (packrat parsers for PEGs). However, in this case the use of a railroad diagram implies a traditional BNF (or BNF-like) grammar, complete with a set of tokens.

edit — actually, looking at that other question, the discussion there is actually about a token grammar itself — the token grammar for JSON. I suppose you could consider the elements of the character set to be "tokens" for that purpose. Anyway it should be clear that in those cases — the rules for what numbers and strings look like — spaces can't appear in the middle of those constructs. That is, 23 and 2 3 are not the same.

Outside of the bizarre situations around automatic semicolon insertion, I can't think of places in the JavaScript grammar that disallows spaces between tokens.

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow