I am trying to make more use of regEx in my search engine. Please take a look :

someStr = "Processor AMD Athlon II X4 651K BOX Black Edition, s. FM1, 3.0GHz, 4MB cache, Quad Core"

# THIS SHOULD MATCH / processors-plural with 0 to 1,
# mega or mb should be the same
# and quad with 0 to 2 of any characters except whitespace
queryListTrue = ["processors", "amd", "4mega", "quaddy"]

# THIS SHOULDN'T MATCH / bad last item length
queryListFalse = ["processors", "amd", "4mb", "quaddie"]

# TO DESCRIBE WHAT I NEED
rulesList = [ r'processor[i.e. 0-1 char]', r'amd',
            r'4mega or 4mb', r'quad[from 0 to 2 any char]' ]

if ALL queryListTrue MATCHES someStr THRU rulesList : 
        print "What a wonderful world!"

Any help would be wonderful.

有帮助吗?

解决方案

The regular expression for "[from 0 to 1 any char]" is simply

.?

i.e. dot . matches any character (except newline, by default) and the ? quantifier means the preceding expression is optional.

Note that processor.? will also match a space after processor or an arbitrary character such as processord. You probably intend processors? where the plural s is optional, or perhaps processor[a-z]? to constrain the optional last character to an alphabetic character.

Similarly, the generalized quantifier {m,n} specifies "at least m repetitions and at most n repetitions", so your "[from 0 to 2 any char]" translated to regex is .{0,2}.

Alternation in regular expressions is specified with | so mega|mb is the regex formulation for your "mega or mb". If you use the alternation in a longer context where some of the text is not subject to alternation, you need to add parentheses to scope the alternation, like m(ega|b).

In Python (like in most modern Perl-derived regex dialects), you can use (?: instead of ( if the grouping behavior of regular parentheses is undesired.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top