Brauchen Sie Hilfe mit regulären Ausdrücken in Qt (QRegExp) [schlecht Wiederholung Syntax?]

https://stackoverflow.com/questions/4507224

12-10-2019
|

Frage

void MainWindow::whatever(){
    QRegExp rx ("<span(.*?)>");
    //QString line = ui->txtNet1->toHtml();
    QString line = "<span>Bar</span><span style='baz'>foo</span>";
    while(line.contains(rx)){
        qDebug()<<"Found rx!";
        line.remove (rx);
    }
}

Ich habe den regulären Ausdruck Online getestet mit diesem Tool . Mit dem gegebenen Regex-String und einem Beispieltext von <span style="foo">Bar</span> sagt das Werkzeug, dass es der reguläre Ausdruck sollte in der Zeichenfolge gefunden werden. In meinem Code Qt, aber bin immer ich mir nie in der while-Schleife.

Ich habe wirklich Regex nie benutzt, in Qt oder einer anderen Sprache. Kann jemand etwas Hilfe zur Verfügung stellen? Dank!

[Bearbeiten] So fand ich nur, dass QRegExp eine Funktion errorString() hat zu verwenden, wenn die Regex ungültig ist. Diese I-Ausgang und sehen: „schlechte Wiederholung Syntax“. Nicht wirklich sicher, was das bedeutet. Natürlich bringt ... Dieser Beitrag googeln für „schlechte Wiederholung Syntax“. Verdammt google, Sie schnell.

Lösung

The problem is that QRegExp only supports greedy quantifiers. More precisely, it supports either greedy or reluctant quantifiers, but not both. Thus, <span(.*?)> is invalid, since there is no *? operator. Instead, you can use

QRegExp rx("<span(.*)>");
rx.setMinimal(true);

This will give every *, +, and ? in the QRegExp the behavior of *?, +?, and ??, respectively, rather than their default behavior. The difference, as you may or may not be aware, is that the minimal versions match as few characters as possible, rather than as many.

In this case, you can also write

QRegExp rx("<span([^>]*)>");

This is probably what I would do, since it has the same effect: match until you see a >. Yours is more general, yes (if you have a multi-character ending token), but I think this is slightly nicer in the simple case. Either will work, of course.

Also, be very, very careful about parsing HTML with regular expressions. You can't actually do it, and recognizing tags is—while (I believe) possible—much harder than just this. (Comments, CDATA blocks, and processing instructions throw a wrench in the works.) If you know the sort of data you're looking at, this can be an acceptable solution; even so, I'd look into an HTML parser instead.

Andere Tipps

What are you trying to achieve? If you want to remove the opening tag and its elements, then the pattern

<span[^>]*>

is probably the simplest.

The syntax .*? means non-greedy match which is widely supported, but may be confusing the QT regex engine.

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow