質問

If I try to rewrite specific regex functionalities (e.g. substituting a string) in Python, a solution using the regex module is always faster.

Is regex written in C?

役に立ちましたか?

解決

Regex is a language. It doesn't look like much of one, but it is.

Like every language, it is not written in anything. A language is a set of mathematical rules and restrictions. If we can say that it is written in anything at all, we would probably say that it is written in English. (Or in a specific English-based jargon for specifying languages, enriched with graphical and mathematical tools for expressing language rules.)

A specific implementation of the language (regex) is of course written in a specific language, but the language itself isn't.

As an example, the implementation of the re module that ships as part of the CPython implementation of the Python programming language is called the Secret Labs' Regular Expression Engine (sre), and is written in Python and C. More precisely, it consists of a compiler written in Python that compiles re regexes into byte code for a virtual machine, and a VM written in C that interprets that byte code.

The implementation that ships with Jython uses the same Python code and byte code, but the byte code VM is written in Java, not C.

At first glance, IronPython looks similar: compiler in Python and VM in C#. However, if you look closer, the VM is actually a non-functional stub, and the real implementation is in C# and is based on System.Text.RegularExpressions from the CLI.

PyPy follows the standard pattern again: compiler in Python and the VM in RPython.

And of course other languages have completely different flavors of regex. E.g. Ruby's Regexp is quite different from Python's re. And in Ruby, we have similar diversity: YARV uses an engine called Onigmo to implement its Regexp class whereas JRuby uses joni.

ライセンス: CC-BY-SA帰属
所属していません softwareengineering.stackexchange
scroll top