Question

I'm really bad in regular expressions, so please help me.

I need to find in string any pieces like #text.

text mustn't contain any space characters (\\s). It's length must be at least 2 characters ({2,}), and it must contain at least 1 letter(QChar::isLetter()).

Examples:

  • #c, #1, #123456, #123 456, #123_456 are incorrect
  • #cc, #text, #text123, #123text are correct

I use QRegExp.

Was it helpful?

Solution

Styne666 gave the right regex.

Here is a little Perl script which is trying to match its first argument with this regex:

    #!/usr/bin/env perl
    use strict;
    use warnings;
    my $arg = shift;
    if ($arg =~ m/(#(?=\d*[a-zA-Z])[a-zA-Z\d]{2,})/) {
        print "$1 MATCHES THE PATTERN!\n";
    } else {
        print "NO MATCH\n";
    }

Perl is always great to quickly test your regular expressions.

Now, your question is a bit different. You want to find all the substrings in your text string, and you want to do it in C++/Qt. Here is what I could come up with in couple of minutes:

    #include <QtCore/QCoreApplication>
    #include <QRegExp>
    #include <iostream>

    using namespace std;

    int main(int argc, char *argv[])
    {
        QString str = argv[1];
        QRegExp rx("[\\s]?(\\#(?=\\d*[a-zA-Z])[a-zA-Z\\d]{2,})\\b");

        int pos = 0;
        while ((pos = rx.indexIn(str, pos)) != -1)
        {
            QString token = rx.cap(1);
            cout << token.toStdString().c_str() << endl;
            pos += rx.matchedLength();
        }

        return 0;
    }

To make my test I feed it an input like this (making a long string just one command line argument):

    peter@ubuntu01$ qt-regexp "#hjhj  4324   fdsafdsa  #33e #22"

And it matches only two words: #hjhj and #33e.

Hope it helps.

OTHER TIPS

QRegExp rx("#(\\S+[A-Za-z]\\S*|\\S*[A-Za-z]\\S+)$");
bool result = (rx.indexIn(str) == 0);

rx either finds a non-whitespace followed by a letter and by an unspecified number of non-whitespace characters, or a letter followed by at least non-whitespace.

The shortest I could come up with (which should work, but I haven't tested extensively) is:

QRegExp("^#(?=[0-9]*[A-Za-z])[A-Za-z0-9]{2,}$");

Which matches:

  • ^ the start of the string
  • # a literal hash character
  • (?= then look ahead (but don't match)
    • [0-9]* zero or more latin numbers
    • [A-Za-z] a single upper- or lower-case latin letter
  • )
  • [A-Za-z0-9]{2,} then match at least two characters which may be upper- or lower-case latin letters or latin numbers
  • $ then find and consume the end of the line

Technically speaking though this is still wrong. It only matches latin letters and numbers. Replacing a few bits gives you:

QRegExp("^#(?=\\d*[^\\d\\s])\\w{2,}$");

This should work for non-latin letters and numbers but this is totally untested. Have a quick read of the QRegExp class reference for an explanation of each escaped group.

And then to match within larger strings of text (again, untested):

QRegExp("\b#(?=\\d*[^\\d\\s])\\w{2,}\b");

A useful tool is the Regular Expressions Example which comes with the SDK.

use this regular expression. hope fully your problem will solve with given RE.

^([#(a-zA-Z)]+[(a-zA-Z0-9)]+)*(#[0-9]+[(a-zA-Z)]+[(a-zA-Z0-9)]*)*$
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top