質問

I am not able to understand the practical difference between ? and * in regular expressions. I know that ? means to check if previous character/group is present 0 or 1 times and * means to check if the previous character/group is present 0 or more times.

But this code

while(<>) {
  chomp($_);
  if(/hello?/) {
    print "metch $_ \n";
  }
  else {
    print "naot metch $_ \n";
  }
}

gives the same out put for both hello? and hello*. The external file that is given to this Perl program contains

hello
helloooo
hell

And the output is

metch hello 
metch helloooo 
metch hell 

for both hello? and hello*. I am not able to understand the exact difference between ? and *

役に立ちましたか?

解決

In Perl (and unlike Java), the m//-match operator is not anchored by default.

As such all of the input it trivially matched by both /hello?/ and /hello*/. That is, these will match any string that contains "hell" (as both quantifiers make the "o" optional) anywhere.

Compare with /^hello?$/ and /^hello*$/, respectively. Since these employ anchors the former will not match "helloo" (as at most one "o" is allowed) while the latter will.


Under Regexp Quote-like Operators:

m/PATTERN/ searches [anywhere in] a string for a pattern match, and in scalar context returns true if it succeeds, false if it fails.

他のヒント

What is confusing you is that, without anchors like ^ and $ a regex pattern match checks only whether the pattern appears anywhere in the target string.

If you add something to the pattern after the hello, like

if (/hello?, Ashwin/) { ... }

Then the strings

hello, Ashwin

and

hell, Ashwin

will match, but

helloooo, Ashwin

will not, because there are too many o characters between hell and the comma ,.

However, if you use a star * instead, like

if (/hello*, Ashwin/) { ... }

then all three strings will match.

? Means the last item is optional. * Means it is both optional and you can have multiple items.

ie.

  • hello? matches hell, hello
  • hello* matches hell, hello, helloo, hellooo, ....

But not using either ^ or $ means these matches can occur anywhere in the string

Here's an example I came up with that makes it quite clear:

What if you wanted to only match up to tens of people and your data was like below:

2 people. 20 people. 200 people. 2000 people.

Only ? would be useful in that case, whereas * would incorrectly capture larger numbers.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top