Question

I am writing a program to detect markdown emphasis syntax in text. For example, the bold syntax enclosed with **, and the italic syntax enclosed with *.

I have the following regex pattern:

NSRegularExpression *regex;
regex = [NSRegularExpression regularExpressionWithPattern:@"(\\*{1,2}).+?(\\*{1,2})"
                                                                       options:NSRegularExpressionDotMatchesLineSeparators
                                                                         error:NULL];

However, this pattern also detects mis-paired ones. For example, matching * this is a **sample** text will return * this is a ** instead of **sample**.

How to solve the problem?

Was it helpful?

Solution

You could use a back reference, with this pattern:

(\*{1,2}).+?\1

This means that whatever is captured in the first group (a single or double asterisk) it must be repeated later as \1.

For example:

NSRegularExpression *regex;
regex = [NSRegularExpression regularExpressionWithPattern:@"(\\*{1,2}).+?\\1"
    options:NSRegularExpressionDotMatchesLineSeparators
    error:NULL];
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top