Frage

I'm trying to understand regular expressions better. I'm using Visual Studio 2010. Take for example this expression. In Visual Studio 2010 you can't skip over newlines with [\s\S] so I've heard it's ok to use [^\0]. In the expression I want to match a line but only if it is line 3.

if(regex_search("line 1\nline 2\nline 3\n",
    match,
    regex("^([^\\0]+\\n)?line (3)\\n")))
{
    cout << "match.length(): " << match.length() << endl;

    for(unsigned i = 0; i < match.size(); ++i)
    {
        cout << "match[" << i <<"]: \"" << match[i] << "\"" << endl;
    }
}

Please note the above code won't work with gcc < 4.9 or ideone (since it uses gcc < 4.9).

In Visual Studio 2010 the code returns:

match.length(): 21
match[0]: "line 1
line 2
line 3
"
match[1]: "line 1
line 2
line 3
"
match[2]: "3"

I'm sure there are better ways to match lines but my question is just why did match[1] group match the whole input? I figured the regex would read line 1\nline 2\n for match[1] and stop since I have line 3 after it in the regex. Is there a word for it in regular expressions or is it a bug?

Thanks and if you have edit powers you're welcome to edit this so it's easier to understand.

War es hilfreich?

Lösung 2

I'm pretty sure the match[1] result I get in Visual Studio 2010 is due to a bug.

In Visual Studio 2012 and 2013 and gcc 4.9.0 (20140405) the code returns what I expect:

match.length(): 21
match[0]: "line 1
line 2
line 3
"
match[1]: "line 1
line 2
"
match[2]: "3"

Online regular expression testers RegExr and Regex Hero show the same thing.

In Visual Studio 2010 to make the expression work properly I can make it "lazy" by adding a question mark after the plus sign: "^([^\\0]+?\\n)?line (3)\\n". (That's a string literal so each backslash is escaped with a backslash.) Although it works now (but differently since it's now finding the closest match since it's lazy) I'm sure it's better to just use the latest Visual Studio.

clang-503.0.40 has a different but related bug where it can't process "[^\0]*".

Andere Tipps

For the record, this works in Visual Studio and finds the third line, returning "line 3: :

^(?<=(?:[^\n]+\n){2})[^\n]+

As for your expression,

^([^\0]+\n)?line (3)\n

We have to decide if you are trying to match in Visual Studio's Find function or by making a console program in Visual Studio. These are two very different cases.

A. In Visual Studio's Find Function

In Visual Studio's Find function, if you make a text file like this:

line 1
line 2
line 3

your regex will not match. Why? Because after line 3 you cannot find \n in a Visual Studio file. Instead, at the line break, you find \r\n which is the standard Windows line break.

Adding the \r fixes it:

^([^\0]+\n)?line (3)\r\n

That being said, this regex matches any line, not just line 3, for the simple reason that the [^\0] eats up all the characters, including the newlines, then backtracks until it is before the final new line, at which stage the \n, line 3 and \n tokens complete the match. If you wanted to use [^\0] instead of [^\n], this would be sure you match line 3:

^(?<=([^\0]+?\n){2})line 3\r\n

B. In a Console App built in Visual Studio

If you feed a console app your string "line 1\nline 2\nline 3\n", then your original regex matches. However, it matches all three lines, for the reason mentioned above (the [^\0] eats up all the characters, including the newlines, then backtracks until it is before the final new line, at which stage the \n, line 3 and \n tokens complete the match).

Here, if you only want line 3 and use [^\0], you can use this for instance:

^(?<=([^\0]+?\n){2})line 3\n
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top