Question

void MainWindow::whatever(){
    QRegExp rx ("<span(.*?)>");
    //QString line = ui->txtNet1->toHtml();
    QString line = "<span>Bar</span><span style='baz'>foo</span>";
    while(line.contains(rx)){
        qDebug()<<"Found rx!";
        line.remove (rx);
    }
}

I've tested the regular expression online using this tool. With the given regex string and a sample text of <span style="foo">Bar</span> the tool says that it the regular expression should be found in the string. In my Qt code, however, I'm never getting into my while loop.

I've really never used regex before, in Qt or any other language. Can someone provide some help? Thanks!

[edit] So I just found that QRegExp has a function errorString() to use if the regex is invalid. I output this and see: "bad repetition syntax". Not really sure what this means. Of course, googling for "bad repetition syntax" brings up... this post. Damn google, you fast.

Was it helpful?

Solution

The problem is that QRegExp only supports greedy quantifiers. More precisely, it supports either greedy or reluctant quantifiers, but not both. Thus, <span(.*?)> is invalid, since there is no *? operator. Instead, you can use

QRegExp rx("<span(.*)>");
rx.setMinimal(true);

This will give every *, +, and ? in the QRegExp the behavior of *?, +?, and ??, respectively, rather than their default behavior. The difference, as you may or may not be aware, is that the minimal versions match as few characters as possible, rather than as many.

In this case, you can also write

QRegExp rx("<span([^>]*)>");

This is probably what I would do, since it has the same effect: match until you see a >. Yours is more general, yes (if you have a multi-character ending token), but I think this is slightly nicer in the simple case. Either will work, of course.

Also, be very, very careful about parsing HTML with regular expressions. You can't actually do it, and recognizing tags is—while (I believe) possible—much harder than just this. (Comments, CDATA blocks, and processing instructions throw a wrench in the works.) If you know the sort of data you're looking at, this can be an acceptable solution; even so, I'd look into an HTML parser instead.

OTHER TIPS

What are you trying to achieve? If you want to remove the opening tag and its elements, then the pattern

<span[^>]*>

is probably the simplest.

The syntax .*? means non-greedy match which is widely supported, but may be confusing the QT regex engine.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top