どのように私は、Perlと同棲て混合している文字列を見つけることができますか？

https://stackoverflow.com/questions/1867602

18-09-2019
|

質問

私は、混在ケースに文字列定数を含むものを探して、数千のファイルをフィルタ処理しようとしています。このような文字列は空白に埋め込むことができますが、空白文字自体を含めることはできません。だから、（UCの文字を含む）以下は、試合のとおりです：

"  AString "   // leading and trailing spaces together allowed
"AString "     // trailing spaces allowed
"  AString"    // leading spaces allowed
"newString03"  // numeric chars allowed
"!stringBIG?"  // non-alphanumeric chars allowed
"R"            // Single UC is a match

が、これらはありません。

"A String" // not a match because it contains an embedded space
"Foo bar baz" // does not match due to multiple whitespace interruptions
"a_string" // not a match because there are no UC chars

私はまだ、の両方のパターンを含む行にマッチさせたいです

"ABigString", "a sentence fragment" // need to catch so I find the first case...

私は、好ましくは、 ACK のコマンドラインツールで駆動されるPerlの正規表現を、使用したいです。明らかに、の\ワットと \ W の仕事に行くされていません。それは思われるの\ S の非空白文字と一致する必要があります。私は、「文字列ごとに少なくとも1つの大文字の文字」...

の要件を埋め込む方法を見つけ出すように見えることはできません

ack --match '\"\s*\S+\s*\"'

私が得ている最も近いです。私はの何かのこと「（非空白文字列の任意の位置にある）少なくとも一つの大文字（ASCII）文字」を取り込み要件をの\ S + を交換する必要がありますます。

このは（正規表現に頼ることなく、手続きはい、Perlの、）C / C ++でプログラムに簡単です、私はちょうど同じ仕事をすることができ、正規表現があるかどうかを把握しようとしている。

解決

次のパターンは、すべてのテストに合格します：

qr/
  "      # leading single quote

  (?!    # filter out strings with internal spaces
     [^"]*   # zero or more non-quotes
     [^"\s]  # neither a quote nor whitespace
     \s+     # internal whitespace
     [^"\s]  # another non-quote, non-whitespace character
  )

  [^"]*  # zero or more non-quote characters
  [A-Z]  # at least one uppercase letter
  [^"]*  # followed by zero or more non-quotes
  "      # and finally the trailing quote
/x

使用して、このテストプログラム/xせず、したがって、空白やコメント-AS（ack-grepはUbuntuで呼ばれるように）入力ackすることなく、上記のパターンを使用して

#! /usr/bin/perl

my @tests = (
  [ q<"  AString ">   => 1 ],
  [ q<"AString ">     => 1 ],
  [ q<"  AString">    => 1 ],
  [ q<"newString03">  => 1 ],
  [ q<"!stringBIG?">  => 1 ],
  [ q<"R">            => 1 ],
  [ q<"A String">     => 0 ],
  [ q<"a_string">     => 0 ],
  [ q<"ABigString", "a sentence fragment"> => 1 ],
  [ q<"  a String  "> => 0 ],
  [ q<"Foo bar baz">  => 0 ],
);

my $pattern = qr/"(?![^"]*[^"\s]\s+[^"\s])[^"]*[A-Z][^"]*"/;
for (@tests) {
  my($str,$expectMatch) = @$_;
  my $matched = $str =~ /$pattern/;
  print +($matched xor $expectMatch) ? "FAIL" : "PASS",
        ": $str\n";
}

は、次の出力を生成します：

$ ack-grep '"(?![^"]*[^"\s]\s+[^"\s])[^"]*[A-Z][^"]*"' try
  [ q<"  AString ">   => 1 ],
  [ q<"AString ">     => 1 ],
  [ q<"  AString">    => 1 ],
  [ q<"newString03">  => 1 ],
  [ q<"!stringBIG?">  => 1 ],
  [ q<"R">            => 1 ],
  [ q<"ABigString", "a sentence fragment"> => 1 ],
my $pattern = qr/"(?![^"]*[^"\s]\s+[^"\s])[^"]*[A-Z][^"]*"/;
  print +($matched xor $expectMatch) ? "FAIL" : "PASS",

Cシェルとその誘導体を使用すると、強打をエスケープする必要があります：

% ack-grep '"(?\![^"]*[^"\s]\s+[^"\s])[^"]*[A-Z][^"]*"' ...

私は強調表示の一致を保つことがしたいが、それはしていないようを許可されます。

二重引用符（\"）をエスケープ

注意が厳しく、このパターンを混乱させます。

他のヒント

あなたは次のように、文字クラスとの要件を追加することができます

ack --match "\"\s*\S+[A-Z]\S+\s*\""

私はackは一度に1行と一致していることを仮定しています。 \S+\s*\"部が一列に複数閉鎖引用符を一致させることができます。それはちょうど"alfa""するのではなく、"alfa"全体に一致します。

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow