Question

I am trying to extract text from between square brackets on a line of text. I've been messing with the regex for some time now, and cannot get what I need. (I can't even explain why the output is what it is). Here's the code:

QRegExp rx_timestamp("\[(.*?)\]");
int pos = rx_timestamp.indexIn(line);
if (pos > -1) {
    qDebug() << "Captured texts: " << rx_timestamp.capturedTexts();
    qDebug() << "timestamp cap: " <<rx_timestamp.cap(0);
    qDebug() << "timestamp cap: " <<rx_timestamp.cap(1);
    qDebug() << "timestamp cap: " <<rx_timestamp.cap(2);
} else qDebug() << "No indexin";

The input line is:

messages:[2013-10-08 09:13:41] NOTICE[2366] chan_sip.c: Registration from '"xx000 <sip:xx000@183.229.164.42:5060>' failed for '192.187.100.170' - No matching peer found

And the output is:

Captured texts:  (".") 
timestamp cap:  "." 
timestamp cap:  "" 
timestamp cap:  "" 
  1. Can someone explain what is going on? Why is cap returning "." when no such character exists between square brackets
  2. Can someone correct the regex to extract the timestamp from between the square brackets?
Was it helpful?

Solution

You are missing two things. Escaping the backslash, and using setMinimal. See below.

QString line = "messages:[2013-10-08 09:13:41] NOTICE[2366] chan_sip.c: Registration from '\"xx000 <sip:xx000@183.229.164.42:5060>' failed for '192.187.100.170' - No matching peer found";

QRegExp rx_timestamp("\\[(.*)\\]");
rx_timestamp.setMinimal(true);
int pos = rx_timestamp.indexIn(line);
if (pos > -1) {
    qDebug() << "Captured texts: " << rx_timestamp.capturedTexts();
    qDebug() << "timestamp cap: " <<rx_timestamp.cap(0);
    qDebug() << "timestamp cap: " <<rx_timestamp.cap(1);
    qDebug() << "timestamp cap: " <<rx_timestamp.cap(2);
} else qDebug() << "No indexin";

Output:

Captured texts:  ("[2013-10-08 09:13:41]", "2013-10-08 09:13:41") 
timestamp cap:  "[2013-10-08 09:13:41]" 
timestamp cap:  "2013-10-08 09:13:41" 
timestamp cap:  "" 

UPDATE: What is going on:

A backslash in c++ source code indicates that the next character is an escape character, such as \n. To have a backslash show up in a regular expression you have to escape a backslash like so: \\ That will make it so that the Regular Expression engine sees \, like what Ruby, Perl or Python would use.

The square brackets should be escaped, too, because they are used to indicate a range of elements normally in regex.

So for the Regular expression engine to see a square bracket character you need to send it

\[

but a c++ source file can't get a \ character into a string without two of them in a row so it turns into

\\[

While learning regex, I liked using this regex tool by GSkinner. It has a listing on the right hand side of the page of unique codes and characters.

QRegEx doesn't match regex exactly. If you study the documentation you find a lot of little things. Such as how it does Greedy v. Lazy matching.

QRegExp and double-quoted text for QSyntaxHighlighter

How the captures are listed is pretty typical as far as I have seen from regex parsers. The capture listing first lists all of them, then it lists the first capture group (or what was enclosed by the first set of parentheses.

http://qt-project.org/doc/qt-5.0/qtcore/qregexp.html#cap

http://qt-project.org/doc/qt-5.0/qtcore/qregexp.html#capturedTexts

To find more matches, you have to iteratively call indexIn.

http://qt-project.org/doc/qt-5.0/qtcore/qregexp.html#indexIn

QString str = "offsets: 1.23 .50 71.00 6.00";
QRegExp rx("\\d*\\.\\d+");    // primitive floating point matching
int count = 0;
int pos = 0;
while ((pos = rx.indexIn(str, pos)) != -1) {
    ++count;
    pos += rx.matchedLength();
}
// pos will be 9, 14, 18 and finally 24; count will end up as 4

Hope that helps.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top