Question

This is more of a computer science question than a programming one, but I figure that this is the best place out of all the related sites to ask this.

When I discovered Regular Expressions and looked up the term I assumed that this property of "regularity" refers to the fact that the expression's language has a definable structural pattern. However, in reading about the subject and the theory behind this I learned that there are kinds of languages that are not regular, and yet from the way they are defined it's clear that a pattern can be matched to them. One such language is (a^n)(b^n). Clearly this is a pattern, and yet this is not a regular language. So now I'm left wondering what is it about regular languages that makes them regular, and this language not?

Was it helpful?

Solution

The etymology of the name comes from Kleene's 1950s work describing regular sets using his mathematical notation created for the purpose. See this.

OTHER TIPS

Intuitively explaining computer science is... tricky. I'll give it a shot, but keep in mind that some of this is going to be "close enough" but not theoretically rigorous.

A regular language is one that can be decided by a machine that is computational equivalent to a finite automata (DFA/NDFA). A finite automata can be thought of as a machine that operates purely in states, no storage. So you can see that anbn cannot be regular as it requires a machine that can count the number of a's and b's (and thus must have infinite* storage capacity) in order to compare them.

For comparison, (abc)n is regular, because the number of repetitions is irrelevant.

For a more rigorous (and correspondingly denser view) check the wikipedia article and linked pages.

*The infinite doesn't matter here, but I mention it for completeness. It might be easier to think of it as "luckily, always just enough" storage.

Perhaps the Wikipedia article on regular languages can explain it better than we can. However, I'll give it a shot.

From a theoretical standpoint, a regular language (set of strings) is one that can be generated using a finite state automaton. In programmer terms, this is equivalent to saying it can be generated using regular expressions. Thus, all finite languages (sets of strings) are regular, but there are some infinite languages, such as anbn (the language of all strings of n a's followed by n b's) that cannot be recognized using a FSA or regular expressions. There are more powerful computational devices (such as modern computers, which are modeled using Turing Machines) which can recognize those languages.

The reason regular expressions are used so much in programming for string searching is that they can recognize the large majority of strings that are important to us programmers, and at the same time can be implemented to search very quickly using finite state automata.

The word regular in regular expression refers to the Mathematical concept of regular, not the English concept. Just like how the word prime in mathematics bear little relation to prime beef.

It's inherited by CS (which is a branch of mathematics) to refer to a more specific concept: http://en.wikipedia.org/wiki/Regular_language

regular expression are not really regular, the name is etymological.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top