Question

I'm currently toying around with a very large project. A interpreter for a simple scripting language. After weeks of planning, I decided that the best course of action would be to prototype part of the interpreter in Python, and then port that code into C.

I started off by writing a simple lexer based upon some EBNF I had written, using Python. The lexer needed to handle state, so I made heavy use of Python's OOP tools. Once I was satisfied with my prototype, I decided to have a go at porting it over to C.

Python, of course, is object oriented and has classes. While C on the other hand, has not such concept. I begin to wonder whether I should re-write my lexer. It had heavily relied on the state provide by classes, but I of course couldn't use classes in C.

So I began re-writing my entire lexer from scratch to use only functions and global variables. This started to produce code which was a bit verbose, and hard to debug. I stopped and wonder whether I was making the right call. Should I be forced to ignore features of one language just because the language I plan to solidify the code in, does not have such features?.

This may seem like an easy no, but what about the case in which a prototype largely relies on a certain feature of a language(such as mine)? Should you really spend hours trying to make up for the lack of such features?

But the flip side could just as easily be argued. Why should you be forced to write unidiomatic, verbose code in one language, just to compensate for the lack of such features in another?


Since @MichealDurrant had requested it, I'll give my reasons for using C:

  • For speed
  • For portability 3
  • For learning purposes
  • I prefer the language in general.
  • C seems to be the de-facto for creating programming languages. Numerous languages have been created in C.
Était-ce utile?

La solution

Based on my experience writing compilers and related tools in C and similar languages, I would NOT choose to write a compiler in C if I had any other, better choices. And in 2016, there are plenty of better choices. But, it's your compiler, and YMMV.

The TL;DR backstory:

"We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil."
                                        — Donald Knuth

He goes on to say "Yet we should not pass up our opportunities in that critical 3%." Assuming you take Dr. Knuth's advice in such matters (and you really should), the key question is therefore: Are you in that 3%?

Seems doubtful. C can run faster on low-level code than any Python, Perl, or PHP ever will. It is a great medium-level language, excellent at rendering structured code into machine instructions. For those writing core compilers, databases, and middleware, C is a solid choice. But for a scripting language? That's still in development? That...seems premature.

In my experience, Python has adequate speed even for performance-focused work. I run programs that parse, analyze, and transform millions of text records. If everything were written in "pure Python," it probably wouldn't be fast enough. But many tasks--parsing text or XML, churning through mounds of data, etc.--those are almost always handled in C already. Modules like Pandas, NumPy, and LXML do their heavy lifting at the lowest, most optimized level possible. Python takes advantage of their optimizations, but in a very clean, standardized, high-level language. I have rarely found a need to go elsewhere. When I have, very narrowly targeted optimizations in C or Cython, with main program flow managed in Python, has been excellent.

Having stared at tons of C over the years (operating systems, compliers, middleware, applications, utilities), and having written some of same, I challenge the idea that C is especially portable. It is compared to the era from which it emerged. But that was the 1970s and 1980s. The competition was almost entirely non-portable. C remains extraordinarily exposed to differences in platform byte order, word length, addressing semantics, and operating system flavors/versions. Code meant to run many platforms is often littered with #ifdef and direct platform knowledge. It's ugly. In the end, C is not very portable in comparison to more modern languages such as Java, Go, Python, and JavaScript.

So if you like C, want to study it, or want to work there, God bless. But as you're already finding, it's not a particularly supportive environment by modern standards. Classes, dictionaries, flexible lists, good and simple string matching, problem-oriented data structures like set and Counter, easy exception handling, strong module support.... The list of valuable things that C lacks is very long. You might be able to get some higher-level features back if you use a framework like Cello. Still, C is the best design of 45 years ago; it lacks the many advances in ease, flexibility, robustness, and structure than have emerged since.

You should weigh the advantages of C's potentially greater execution speed against the less-supportive environment, longer development time, lower likely reliability, and other factors. In 1970 or 1980, system resources were extremely tight, so optimize-optimize-optimize made sense. In this day of multicore processors and gigabyte memories even on smartphones, you can legitimately consider optimizing attributes other than performance. Time-to-market, program sophistication, reliability, maintainability--there are a lot of things that you can reasonably optimize in Python (or another high level language) that you can't readily optimize in C.

Autres conseils

Ignoring features in such a case does not make sense to me.

The typical reason for using Python for prototyping is exactly because you can implement things in a fraction of time and "space" (=lines of code) than in C - by making use of features Python has, but C does not. If you would restrict yourself only to Python features which have a 1:1 correspondence to a C feature, using Python first hand would not make any sense anymore: you could implement your program in C right from the start.

By the way, did you consider Cython for your use case?

Any Turing-Complete language feature can be implemented in any other Turing-Complete language. Were that not the case, we'd still be stuck with the first programming language ever invented. So really, it comes down to how much effort you want to expend.

In a way, you're asking the wrong question. The question is not "Should I implement a feature in the target language if it doesn't exist in the source language. The question should be "Does my source language have sufficient expressiveness to readily implement the features that I want in my target language?"

In other words, you don't need object-orientation in your source language to successfully implement it in your target language.

C is a natural choice for language interpreters and compilers. Its relative simplicity actually makes it ideal for implementing other languages, because it has relatively few baked in paradigms itself, and so does not box you in with its own design choices.

In addition, C makes a good target for an Intermediate Language to compile to; because it is cross-platform, you can compile the resulting code to a binary almost anywhere.

Compiler bootstrapping is always hard. I once wrote a Modula2 compiler for an IBM/370. The compiler came in Modula2. So I was in the situation of Baron Muenchhausen who pulled himself out of the swamp on his own hair. The way I did it was to first write a cross-compiler from Modula2 to PascalVS which are very similar (both are Wirth's children). Then I was able to compile the compiler. As a result the compiler produced M-code which could be run with an interpreter coming along with the Modula2 compiler. Veeery slow, but working. So the last part way writing a code generator for assembler.

Now, what can be learned from that? See Robert's answer. Writing a compiler directly in C may not be that "smart" but it saves you a lot of work.

You wrote your Python program in order to proof a concept, I understand - I think that is perfectly fine and admirable.

Your plan was, from the start to eventually implement in C. Fine as well, I also gather you want the speed and effectiveness of that language (Even if you'll find a lot of people who would be willing to argue about C vs. Python advantages, but that is not the point here).

Then you say, however, you have been porting the proof of concept to C - I think that is generally the wrong way to look at it. A proof of concept in another language shouldn't be ported - The implementation should be an implementation from scratch and use the concept you have proven, not the program you used to do that - I hope you see the difference. I guess it would be much better to completely "forget" your Python implementation before even trying to implement in C.

Licencié sous: CC-BY-SA avec attribution
scroll top