Question

background: - there are formal languages for expressing programming language valid lexicon and syntax - such representations (e.g. regular expression and context-free grammars) can be automatically compiled into lexicon/syntax analyzers for some programming language using some tools (e.g. LEX and YACC)

questions: - are there such formal languages for defining programming language semantics ? - are there compilers to automate generating the code of the semantic analyzer based on these formal languages ? - any resources to read regarding source code semantic analysis ?

notes: - you might find my question unrealistic, it’s ok… I’m a newbie to compilers ? - by semantic analysis i really mean collecting necessary information out of source code for the next steps of code optimization and generation… this should include type/boundary checking… etc

i hope that i made myself clear many thanks in advance

Was it helpful?

Solution

There are many schemes for defining the semantics of programming languages: denotational semantics (which maps syntax into functions that compute program state); operational semantics (which amounts to building interpreters for your language), transformational semantics (which amounts to translating your languauge to another language for which some other semantics already exists), etc.

Very few of them presently lead to systems usable for real programming languages. One of the other answers here suggests Centaur as a system that tries to do this. Action semantics by Peter Mosses is one of the more serious recent attempts.

For real systems, rather more ad hoc methods are presently the most effective. These inlude systems in which the lexical and grammatical syntax can be effectively defined (as variants of LEX and YACC), and automatically build trees. Attribute grammars allow the specification of computations over trees, which allow one to define some kinds of analyses, such as symbol table construction or metrics (technically you could do denotational semantics this way). Most conventional languages (C, Java, C#, COBOL, ...) all have relatively similar structures regarding control flow and data flow, and as a consequence one can build generic flow analysis routines to allow one to reason about such standard languages.

Finally, you need a purpose for your semantic analysis: what facts exactly do you want to extract? The available static analysis systems have hybrid pattern-driven/procedural code methods for collecting the syntax, symbol-table, and flow facts of interest to compute specific answers to specific questions.

Some systems enable one to use this semantic information to carry source code modification.

One system that follows the more ad hoc approach above is our DMS Software Reengineering Toolkit, which also has the generic semantic defintions (syntax, symbol tables, data flow anlaysis) completed for real languages such as Java, C, C++ and COBOL. DMS can apply source-to-source transformations to the ASTs conditioned by various fact-collecting procedures, and this enables mass transformation of code in a reliable way.

OTHER TIPS

There are many formal languages for programming languauge semantics - look at the wikipedia entry for some more terms to google. I did a small amount of Z in my early career ( that link to the formal methods wiki is another possible place to start exploring ). I'd suggest asking on the lambda the ultimate forum as the people there use some of them, and can suggest what to use for whatever use case you have in mind; this paper seems somewhat related to what you asked for in terms of generating compilers - you need both the semantics of the language and a description of the target machine to create a compiler.

Specifically for static analysis, look at http://rw4.cs.uni-sb.de/~martin/pag/

What you describe is exactly what the Centaur project was about:

http://www-sop.inria.fr/croap/centaur/centaur.html

In fact, you could very far in describing the semantics of your language, to the point where the system was able to give you an interpreter for that language (that's when you had described the semantics completely). But you didn't have to go all the way. You could do less description work, and still obtain a structured editor and typechecker for your efforts.

Although work has ceased on the project (as far as I can tell), you may find interesting articles and download in the links.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top