Вопрос

I was trying to do a regex replace with boost::regex, but it doesn't seem to be working.

Here is the regex expression:

(\\w+,\\d+,\\d+,\\d+\tscript\t)(.+)(#)(.+)(\t\\d+(,\\d+)?(,\\d+)?,{)

And the formatter:

$1\"$2\"$3\"$4\"$5

The code: (getInput() returns a string with content that should match)

std::string &Preprocessor::preprocess()
{
    std::string &tempString = getInput();
    boost::regex scriptRegexFullName;
    const char *scriptRegexFullNameReplace = "$1\"$2\"$3\"$4\"$5";

    scriptRegexFullName.assign("(\\w+,\\d+,\\d+,\\d+\tscript\t)(.+)(#)(.+)(\t\\d+(,\\d+)?(,\\d+)?,{)");

    tempString = boost::regex_replace(tempString, scriptRegexFullName, scriptRegexFullNameReplace, boost::match_default);

    return tempString;
}

When I put the following test cases on this website:

alberta,246,82,3    script  Marinheiro#bra2 100,{
brasilis,316,57,3   script  Marinheiro#bra1 100,{
brasilis,155,165,3  script  Orientação divina#bra1  858,{

The output of the website is correct:

alberta,246,82,3    script  "Marinheiro"#"bra2" 100,{
brasilis,316,57,3   script  "Marinheiro"#"bra1" 100,{
brasilis,155,165,3  script  "Orientação divina"#"bra1"  858,{

But with boost::regex the output is:

alberta,246,82,3    script  "Marinheiro#bra2    100,{
brasilis,316,57,3   script  Marinheiro#bra1 100,{
brasilis,155,165,3  script  Orientação divina#bra1  858,{

What am I doing wrong, anyone knows?

Thanks for the help.

Это было полезно?

Решение

The problem come from your first (.+) which is greedy and grab all he can, probably until the last # of the subject string.

You can try with this pattern:

const char *scriptRegexFullNameReplace = "$1\"$2\"#\"$3\"$4";

scriptRegexFullName.assign("(\\p{L}+,\\d+,\\d+,\\d+\\s+script\\s+)([^#]+)#(\\S+)(\\s+\\d+,\\{)");

Notices:

  • the escape of the curly bracket is probably uneeded, try to remove it.
  • p{L} stand for any unicode letter but you can try replace it by [^,] if it is a problem
  • You can replace all + by ++ for more performances (no backtracks allowed)
  • No need to capture the sharp to replace it by itself, it is the reason why the pattern has only four capturing groups
  • instead of using (.+?) (the dot with a lazy quantifier), it is better for performances to use a greedy quantifier with a reduced character class: [^#] that will match all characters until the first #
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top