Domanda

I need to get the nested blocks in hash array or hash tree to be able to substitute the blocks with dynamic contents. I need to replace the code between

<!--block:XXX-->

and the first closing end block

<!--endblock--> 

with my dynamic content.

I have this code that finds one level comments blocks but not nested:

#<!--block:listing-->... html code block here ...<!--endblock-->
$blocks{$1} = $2 while $content =~ /<!--block:(.*?)-->((?:(?:(?!<!--(.*?)-->).)|(?R))*?)<!--endblock-->/igs;

Here is the complete nested html template that I want to process. So I need to find and replace the inner block "block:third" and replace it with my content , then find "block:second" and replace it then find the outer block "block:first" and replace it. Please note that, there can be any number of nested blocks and not just three like the example below, it could be several nested blocks.

use Data::Dumper;

$content=<<HTML;
some html content here

<!--block:first-->
    some html content here

    <!--block:second-->
        some html content here

        <!--block:third-->
            some html content here
        <!--endblock-->

        some html content here
    <!--endblock-->

    some html content here
<!--endblock-->
HTML

$blocks{$1} = $2 while $content =~ /<!--block:(.*?)-->((?:(?:(?!<!--(.*?)-->).)|(?R))*?)<!--endblock-->/igs;
print Dumper(%blocks);

So I can access and modify the blocks like $block{first} = "my content here" and $block{second} = "another content here" etc then replace the blocks.

I created this regex

È stato utile?

Soluzione 2

I'm gonna add an additional answer. It's in line with my previous answer, but slightly more
complete and I don't want to muddy up that answer any more.

This is for @daliaessam and kind of a specific response to @Miller anecdote's on recursive parsing
using regular expressions.

There is only 3 parts to consider. So, using my previous manifestation, I lay out to you guys a
template on how to do this. Its not as hard as you think.

Cheers!

 # //////////////////////////////////////////////////////
 # // The General Guide to 3-Part Recursive Parsing
 # // ----------------------------------------------
 # // Part 1. CONTENT
 # // Part 2. CORE
 # // Part 3. ERRORS

 (?is)

 (?:
      (                                  # (1), Take off CONTENT
           (?&content) 
      )
   |                                   # OR
      (?>                                # Start-Delimiter (in this case, must be atomic because of .*?)
           <!--block:
           ( .*? )                            # (2), Block name
           -->
      )
      (                                  # (3), Take off The CORE
           (?&core) 
        |  
      )
      <!--endblock-->                    # End-Delimiter

   |                                   # OR
      (                                  # (4), Take off Unbalanced (delimeter) ERRORS
           <!--
           (?: block: .*? | endblock )
           -->
      )
 )

 # ///////////////////////
 # // Subroutines
 # // ---------------

 (?(DEFINE)

      # core
      (?<core>
           (?>
                (?&content) 
             |  
                (?> <!--block: .*? --> )
                # recurse core
                (?:
                     (?&core) 
                  |  
                )
                <!--endblock-->
           )+
      )

      # content 
      (?<content>
           (?>
                (?!
                     <!--
                     (?: block: .*? | endblock )
                     -->
                )
                . 
           )+
      )

 )

Perl code:

use strict;
use warnings;

use Data::Dumper;

$/ = undef;
my $content = <DATA>;

# Set the error mode on/off here ..
my $BailOnError = 1;
my $IsError = 0;

my $href = {};

ParseCore( $href, $content );

#print Dumper($href);

print "\n\n";
print "\nBase======================\n";
print $href->{content};
print "\nFirst======================\n";
print $href->{first}->{content};
print "\nSecond======================\n";
print $href->{first}->{second}->{content};
print "\nThird======================\n";
print $href->{first}->{second}->{third}->{content};
print "\nFourth======================\n";
print $href->{first}->{second}->{third}->{fourth}->{content};
print "\nFifth======================\n";
print $href->{first}->{second}->{third}->{fourth}->{fifth}->{content};
print "\nSix======================\n";
print $href->{six}->{content};
print "\nSeven======================\n";
print $href->{six}->{seven}->{content};
print "\nEight======================\n";
print $href->{six}->{seven}->{eight}->{content};

exit;


sub ParseCore
{
    my ($aref, $core) = @_;
    my ($k, $v);
    while ( $core =~ /(?is)(?:((?&content))|(?><!--block:(.*?)-->)((?&core)|)<!--endblock-->|(<!--(?:block:.*?|endblock)-->))(?(DEFINE)(?<core>(?>(?&content)|(?><!--block:.*?-->)(?:(?&core)|)<!--endblock-->)+)(?<content>(?>(?!<!--(?:block:.*?|endblock)-->).)+))/g )
    {
       if (defined $1)
       {
         # CONTENT
           $aref->{content} .= $1;
       }
       elsif (defined $2)
       {
         # CORE
           $k = $2; $v = $3;
           $aref->{$k} = {};
 #         $aref->{$k}->{content} = $v;
 #         $aref->{$k}->{match} = $&;

           my $curraref = $aref->{$k};
           my $ret = ParseCore($aref->{$k}, $v);
           if ( $BailOnError && $IsError ) {
               last;
           }
           if (defined $ret) {
               $curraref->{'#next'} = $ret;
           }
       }
       else
       {
         # ERRORS
           print "Unbalanced '$4' at position = ", $-[0];
           $IsError = 1;

           # Decide to continue here ..
           # If BailOnError is set, just unwind recursion. 
           # -------------------------------------------------
           if ( $BailOnError ) {
              last;
           }
       }
    }
    return $k;
}

#================================================
__DATA__
some html content here top base
<!--block:first-->
    <table border="1" style="color:red;">
    <tr class="lines">
        <td align="left" valign="<--valign-->">
    <b>bold</b><a href="http://www.mewsoft.com">mewsoft</a>
    <!--hello--> <--again--><!--world-->
    some html content here 1 top
    <!--block:second-->
        some html content here 2 top
        <!--block:third-->
            some html content here 3 top
            <!--block:fourth-->
                some html content here 4 top
                <!--block:fifth-->
                    some html content here 5a
                    some html content here 5b
                <!--endblock-->
            <!--endblock-->
            some html content here 3a
            some html content here 3b
        <!--endblock-->
        some html content here 2 bottom
    <!--endblock-->
    some html content here 1 bottom
<!--endblock-->
some html content here1-5 bottom base

some html content here 6-8 top base
<!--block:six-->
    some html content here 6 top
    <!--block:seven-->
        some html content here 7 top
        <!--block:eight-->
            some html content here 8a
            some html content here 8b
        <!--endblock-->
        some html content here 7 bottom
    <!--endblock-->
    some html content here 6 bottom
<!--endblock-->
some html content here 6-8 bottom base

Output >>

Base======================
some html content here top base

some html content here1-5 bottom base

some html content here 6-8 top base

some html content here 6-8 bottom base

First======================

    <table border="1" style="color:red;">
    <tr class="lines">
        <td align="left" valign="<--valign-->">
    <b>bold</b><a href="http://www.mewsoft.com">mewsoft</a>
    <!--hello--> <--again--><!--world-->
    some html content here 1 top

    some html content here 1 bottom

Second======================

        some html content here 2 top

        some html content here 2 bottom

Third======================

            some html content here 3 top

            some html content here 3a
            some html content here 3b

Fourth======================

                some html content here 4 top


Fifth======================

                    some html content here 5a
                    some html content here 5b

Six======================

    some html content here 6 top

    some html content here 6 bottom

Seven======================

        some html content here 7 top

        some html content here 7 bottom

Eight======================

            some html content here 8a
            some html content here 8b

Altri suggerimenti

Update:

This is a response to the "combining" into a single regex...

It appears you don't care about reconstructing the order of the html.
So, if you just want to isolate the content for each sub-section, the below is all you need.
However, you will need lists ( [] ) to reconstitute the order of embedded sub-sections.

After refreshing myself with this question, note that the regex used below is the one you should be using.

use Data::Dumper;

$/ = undef;
my $content = <DATA>;


my $href = {};

ParseCore( $href, $content );

#print Dumper($href);

print "\nBase======================\n";
print $href->{content};
print "\nFirst======================\n";
print $href->{first}->{content};
print "\nSecond======================\n";
print $href->{first}->{second}->{content};
print "\nThird======================\n";
print $href->{first}->{second}->{third}->{content};
print "\nFourth======================\n";
print $href->{first}->{second}->{third}->{fourth}->{content};
print "\nFifth======================\n";
print $href->{first}->{second}->{third}->{fourth}->{fifth}->{content};

exit;

sub ParseCore
{
    my ($aref, $core) = @_;
    my ($k, $v);
    while ( $core =~ /(?is)(<!--block:(.*?)-->((?:(?:(?!<!--block:(?:.*?)-->).)|(?R))*?)<!--endblock-->|((?:(?!<!--block:.*?-->).)+))/g )
    {
       if (defined $2) {
           $k = $2; $v = $3;
           $aref->{$k} = {};
 #         $aref->{$k}->{content} = $v;
 #         $aref->{$k}->{match} = $1;

           my $curraref = $aref->{$k};
           my $ret = ParseCore($aref->{$k}, $v);
           if (defined $ret) {
               $curraref->{'#next'} = $ret;
           }
        }
        else
        {
           $aref->{content} .= $4;
        }
    }
    return $k;
}

#================================================
__DATA__
some html content here top base
<!--block:first-->
    <table border="1" style="color:red;">
    <tr class="lines">
        <td align="left" valign="<--valign-->">
    <b>bold</b><a href="http://www.mewsoft.com">mewsoft</a>
    <!--hello--> <--again--><!--world-->
    some html content here 1 top
    <!--block:second-->
        some html content here 2 top
        <!--block:third-->
            some html content here 3 top
            <!--block:fourth-->
                some html content here 4 top
                <!--block:fifth-->
                    some html content here 5a
                    some html content here 5b
                <!--endblock-->
            <!--endblock-->
            some html content here 3a
            some html content here 3b
        <!--endblock-->
        some html content here 2 bottom
    <!--endblock-->
    some html content here 1 bottom
<!--endblock-->
some html content here1-5 bottom base

some html content here 6-8 top base
<!--block:six-->
    some html content here 6 top
    <!--block:seven-->
        some html content here 7 top
        <!--block:eight-->
            some html content here 8a
            some html content here 8b
        <!--endblock-->
        some html content here 7 bottom
    <!--endblock-->
    some html content here 6 bottom
<!--endblock-->
some html content here 6-8 bottom base

Output >>

Base======================
some html content here top base

some html content here1-5 bottom base

some html content here 6-8 top base

some html content here 6-8 bottom base
First======================

    <table border="1" style="color:red;">
    <tr class="lines">
        <td align="left" valign="<--valign-->">
    <b>bold</b><a href="http://www.mewsoft.com">mewsoft</a>
    <!--hello--> <--again--><!--world-->
    some html content here 1 top

    some html content here 1 bottom

Second======================

        some html content here 2 top

        some html content here 2 bottom

Third======================

            some html content here 3 top

            some html content here 3a
            some html content here 3b

Fourth======================

                some html content here 4 top


Fifth======================

                    some html content here 5a
                    some html content here 5b

You can use REGEX recursion to match outter nesting's, then parse the inner CORE's
using a simple recursive function call.

Then its also possible to parse content on the nesting level that you are on.
Its also possible to create a nested structure along the way to enable you to later
do the template substitutions.

You can then reconstruct the html.
The only tricky part is traversing the array. But, if you know how to traverse
array's (scalars, array/hash ref's, and such) it should be no problem.

Here is the sample.

    # (?is)<!--block:(.*?)-->((?:(?:(?!<!--(?:.*?)-->).)|(?R))*?)<!--endblock-->|((?:(?!<!--.*?-->).)+)

    (?is)                         # Modifiers: Case insensitive, Dot-all
    <!--block:                    # Begin BLOCK
    ( .*? )                       # (1), block name
    -->

    (                             # (2 start), Begin Core
         (?:
              (?:
                   (?!
                        <!--
                        (?: .*? )
                        -->
                   )
                   . 
              )
           |  (?R) 
         )*?
    )                             # (2 end), End Core

    <!--endblock-->               # End BLOCK
 |  
    (                             # (3 start), Or grab content within this core
         (?:
              (?! <!-- .*? --> )
              . 
         )+
    )                             # (3 end)

Perl test case

use Data::Dumper;

$/ = undef;
my $content = <DATA>;


my %blocks = ();
$blocks{'base'} = [];


ParseCore( $blocks{'base'}, $content );


sub ParseCore
{
    my ($aref, $core) = @_;
    while ( $core =~ /(?is)<!--block:(.*?)-->((?:(?:(?!<!--(?:.*?)-->).)|(?R))*?)<!--endblock-->|((?:(?!<!--.*?-->).)+)/g )
    {
        if ( defined $1 )
        {
           my $branch = {};
           push @{$aref}, $branch;
           $branch->{$1} = [];
           ParseCore( $branch->{$1}, $2 );
        }
        elsif ( defined $3 )
        {
           push @{$aref}, $3;
        }
    }

}

print Dumper(\%blocks);

__DATA__

some html content here top base
<!--block:first-->
    some html content here 1 top
    <!--block:second-->
        some html content here 2 top
        <!--block:third-->
            some html content here 3a
            some html content here 3b
        <!--endblock-->
        some html content here 2 bottom
    <!--endblock-->
    some html content here 1 bottom
<!--endblock-->
some html content here bottom base

Output >>

$VAR1 = {
          'base' => [
                      '
some html content here top base
',
                      {
                        'first' => [
                                     '
    some html content here 1 top
    ',
                                     {
                                       'second' => [
                                                     '
        some html content here 2 top
        ',
                                                     {
                                                       'third' => [
                                                                    '
            some html content here 3a
            some html content here 3b
        '
                                                                  ]
                                                     },
                                                     '
        some html content here 2 bottom
    '
                                                   ]
                                     },
                                     '
    some html content here 1 bottom
'
                                   ]
                      },
                      '
some html content here bottom base
'
                    ]
        };

Based on @sln answer above and despite the advises to use Perl templates or parsers modules, I assure there is no one of these modules that can handle this issue direct.

Here is the solution I came up with.

First I find the outer blocks in the entire content or template with simple regex:

/(<!--block:.*?-->(?:(?:(?!<!--(?:.*?)-->).)|(?R))*?<!--endblock-->)/gis

Then I parse each outer block to find its nested sub blocks based on @sln answer above.

/(<!--block:(.*?)-->((?:(?:(?!<!--block:(?:.*?)-->).)|(?R))*?)<!--endblock-->|((?:(?!<!--.*?-->).)+))/igsx

Then everything is working well. I tested with two outer blocks and each one has nested blocks.

I can reach any sub block simply like that:

print $blocks->{first}->{content};

print $blocks->{first}->{match};

print $blocks->{first}->{second}->{third}->{fourth}->{content}

Each block hash ref has the keys:

`content`: the block content without the block name and endblock tags.
`match`: the block content with the block name and endblock tags, good for replacing.
`#next`: has the sub block name if exists, good to check if block has children and access them.

Below is the final Perl tested and working code.

use Data::Dumper;

$/ = undef;
my $content = <DATA>;

my $blocks = parse_blocks($content);

print Dumper($blocks);

#print join "\n", keys( %{$blocks->{first}}); # root blocks names
#print join "\n", keys( %{$blocks->{first}}); # 
#print join "\n", keys( %{$blocks->{first}->{second}});

#print Dumper $blocks->{first};
#print Dumper $blocks->{first}->{content};
#print Dumper $blocks->{first}->{match};

# check if fourth block has sub block.
#print exists $blocks->{first}->{second}->{third}->{fourth}->{'#next'}, "\n";

# check if block has sub block, get it:
#if (exists $blocks->{first}->{second}->{third}->{fourth}->{'#next'}) {
#   print $blocks->{first}->{second}->{third}->{fourth}->{ $blocks->{first}->{second}->{third}->{fourth}->{'#next'} }->{content}, "\n";
#}

exit;
#================================================
sub parse_blocks {
    my ($content) = @_;
    my $href = {};
    # find outer blocks only
    while ($content =~ /(<!--block:.*?-->(?:(?:(?!<!--(?:.*?)-->).)|(?R))*?<!--endblock-->)/gis) {
        # parse each outer block nested blocks
        parse_nest_blocks($href, $1);
    }
    return $href;
}
#================================================
sub parse_nest_blocks {
    my ($aref, $core) = @_;
    my ($k, $v);
    while ( $core =~ /(<!--block:(.*?)-->((?:(?:(?!<!--block:(?:.*?)-->).)|(?R))*?)<!--endblock-->|((?:(?!<!--.*?-->).)+))/igsx )
    {
        if (defined $2) {
           $k = $2; $v = $3;
           $aref->{$k} = {};
           $aref->{$k}->{content} = $v;
           $aref->{$k}->{match} = $1;
           #print "1:{{$k}}\n2:[[$v]]\n";
           my $curraref = $aref->{$k};
           my $ret = parse_nest_blocks($aref->{$k}, $v);
           if ($ret) {
               $curraref->{'#next'} = $ret;
           }
           return $k;
        }
    }

}
#================================================
__DATA__
some html content here top base
<!--block:first-->
    <table border="1" style="color:red;">
    <tr class="lines">
        <td align="left" valign="<--valign-->">
    <b>bold</b><a href="http://www.mewsoft.com">mewsoft</a>
    <!--hello--> <--again--><!--world-->
    some html content here 1 top
    <!--block:second-->
        some html content here 2 top
        <!--block:third-->
            some html content here 3 top
            <!--block:fourth-->
                some html content here 4 top
                <!--block:fifth-->
                    some html content here 5a
                    some html content here 5b
                <!--endblock-->
            <!--endblock-->
            some html content here 3a
            some html content here 3b
        <!--endblock-->
        some html content here 2 bottom
    <!--endblock-->
    some html content here 1 bottom
<!--endblock-->
some html content here1-5 bottom base

some html content here 6-8 top base
<!--block:six-->
    some html content here 6 top
    <!--block:seven-->
        some html content here 7 top
        <!--block:eight-->
            some html content here 8a
            some html content here 8b
        <!--endblock-->
        some html content here 7 bottom
    <!--endblock-->
    some html content here 6 bottom
<!--endblock-->
some html content here 6-8 bottom base

and the output of the entire hash dump is:

$VAR1 = {
          'first' => {
                       'second' => {
                                     'third' => {
                                                  'match' => '<!--block:third-->
            some html content here 3 top
            <!--block:fourth-->
                some html content here 4 top
                <!--block:fifth-->
                    some html content here 5a
                    some html content here 5b
                <!--endblock-->
            <!--endblock-->
            some html content here 3a
            some html content here 3b
        <!--endblock-->',
                                                  'content' => '
            some html content here 3 top
            <!--block:fourth-->
                some html content here 4 top
                <!--block:fifth-->
                    some html content here 5a
                    some html content here 5b
                <!--endblock-->
            <!--endblock-->
            some html content here 3a
            some html content here 3b
        ',
                                                  'fourth' => {
                                                                'fifth' => {
                                                                             'match' => '<!--block:fifth-->
                    some html content here 5a
                    some html content here 5b
                <!--endblock-->',
                                                                             'content' => '
                    some html content here 5a
                    some html content here 5b
                '
                                                                           },
                                                                'match' => '<!--block:fourth-->
                some html content here 4 top
                <!--block:fifth-->
                    some html content here 5a
                    some html content here 5b
                <!--endblock-->
            <!--endblock-->',
                                                                'content' => '
                some html content here 4 top
                <!--block:fifth-->
                    some html content here 5a
                    some html content here 5b
                <!--endblock-->
            ',
                                                                '#next' => 'fifth'
                                                              },
                                                  '#next' => 'fourth'
                                                },
                                     'match' => '<!--block:second-->
        some html content here 2 top
        <!--block:third-->
            some html content here 3 top
            <!--block:fourth-->
                some html content here 4 top
                <!--block:fifth-->
                    some html content here 5a
                    some html content here 5b
                <!--endblock-->
            <!--endblock-->
            some html content here 3a
            some html content here 3b
        <!--endblock-->
        some html content here 2 bottom
    <!--endblock-->',
                                     'content' => '
        some html content here 2 top
        <!--block:third-->
            some html content here 3 top
            <!--block:fourth-->
                some html content here 4 top
                <!--block:fifth-->
                    some html content here 5a
                    some html content here 5b
                <!--endblock-->
            <!--endblock-->
            some html content here 3a
            some html content here 3b
        <!--endblock-->
        some html content here 2 bottom
    ',
                                     '#next' => 'third'
                                   },
                       'match' => '<!--block:first-->
    <table border="1" style="color:red;">
    <tr class="lines">
        <td align="left" valign="<--valign-->">
    <b>bold</b><a href="http://www.mewsoft.com">mewsoft</a>
    <!--hello--> <--again--><!--world-->
    some html content here 1 top
    <!--block:second-->
        some html content here 2 top
        <!--block:third-->
            some html content here 3 top
            <!--block:fourth-->
                some html content here 4 top
                <!--block:fifth-->
                    some html content here 5a
                    some html content here 5b
                <!--endblock-->
            <!--endblock-->
            some html content here 3a
            some html content here 3b
        <!--endblock-->
        some html content here 2 bottom
    <!--endblock-->
    some html content here 1 bottom
<!--endblock-->',
                       'content' => '
    <table border="1" style="color:red;">
    <tr class="lines">
        <td align="left" valign="<--valign-->">
    <b>bold</b><a href="http://www.mewsoft.com">mewsoft</a>
    <!--hello--> <--again--><!--world-->
    some html content here 1 top
    <!--block:second-->
        some html content here 2 top
        <!--block:third-->
            some html content here 3 top
            <!--block:fourth-->
                some html content here 4 top
                <!--block:fifth-->
                    some html content here 5a
                    some html content here 5b
                <!--endblock-->
            <!--endblock-->
            some html content here 3a
            some html content here 3b
        <!--endblock-->
        some html content here 2 bottom
    <!--endblock-->
    some html content here 1 bottom
',
                       '#next' => 'second'
                     },
          'six' => {
                     'match' => '<!--block:six-->
    some html content here 6 top
    <!--block:seven-->
        some html content here 7 top
        <!--block:eight-->
            some html content here 8a
            some html content here 8b
        <!--endblock-->
        some html content here 7 bottom
    <!--endblock-->
    some html content here 6 bottom
<!--endblock-->',
                     'content' => '
    some html content here 6 top
    <!--block:seven-->
        some html content here 7 top
        <!--block:eight-->
            some html content here 8a
            some html content here 8b
        <!--endblock-->
        some html content here 7 bottom
    <!--endblock-->
    some html content here 6 bottom
',
                     'seven' => {
                                  'match' => '<!--block:seven-->
        some html content here 7 top
        <!--block:eight-->
            some html content here 8a
            some html content here 8b
        <!--endblock-->
        some html content here 7 bottom
    <!--endblock-->',
                                  'content' => '
        some html content here 7 top
        <!--block:eight-->
            some html content here 8a
            some html content here 8b
        <!--endblock-->
        some html content here 7 bottom
    ',
                                  'eight' => {
                                               'match' => '<!--block:eight-->
            some html content here 8a
            some html content here 8b
        <!--endblock-->',
                                               'content' => '
            some html content here 8a
            some html content here 8b
        '
                                             },
                                  '#next' => 'eight'
                                },
                     '#next' => 'seven'
                   }
        };

I must repeat for you and anyone else who might find this thread, do not use regular expressions in such a complicated way.

I love regular expressions, but they were not designed for this sort of problem. You're going to be 1,000 times better off using a standard templating system like Template::Toolkit.

The problem with regular expressions in this context is there's a tendency to couple parsing with validation. By doing that, the regex ends up being very fragile and it's common for people to skip validation of their data entirely. For example, when a recursive regex sees ((( )), it will claim there are only 2 levels to those parenthesis. In truth, there are 2 and a 1/2, and that 1/2 is an error that won't be reported.

Now, I already communicated the way to avoid this flaw in regex parsing in my answers to two of your other questions:

Basically, make your parsing regex as simple as possible. This serves multiple purposes. It ensures that your regex will be less fragile, and it also encourages not putting the validation in the parsing phase.

I showed you how start this particular stackoverflow problem in the second above solution. Basically, tokenize your data, and then translate the results into your more complicated data structure. I've had some spare time today, so have decided to actually fully demonstrate how that translation can be easily done:

use strict;
use warnings;

use Data::Dump qw(dump dd);

my $content = do {local $/; <DATA>};

# Tokenize Content
my @tokens = split m{<!--(?:block:(.*?)|(endblock))-->}, $content;

# Resulting Data Structure
my @data = (
    shift @tokens, # First element of split is always HTML
);

# Keep track of levels of content
# - This is a throwaway data structure to facilitate the building of nested content
my @levels = ( \@data );

while (@tokens) {
    # Tokens come in groups of 3.  Two capture groups in split delimiter, followed by html.
    my ($block, $endblock, $html) = splice @tokens, 0, 3;

    # Start of Block - Go up to new level
    if (defined $block) {
        #debug# print +('  ' x @levels) ."<$block>\n";
        my $hash = {
            block    => $block,
            content  => [],
        };
        push @{$levels[-1]}, $hash;
        push @levels, $hash->{content};

    # End of Block - Go down level
    } elsif (defined $endblock) {
        die "Error: Unmatched endblock found before " . dump($html) if @levels == 1;
        pop @levels;
        #debug# print +('  ' x @levels) . "</$levels[-1][-1]{block}>\n";
    }

    # Append HTML content
    push @{$levels[-1]}, $html;
}
die "Error: Unmatched start block: $levels[-2][-1]{block}" if @levels > 1;

dd @data;

__DATA__

some html content here top base
<!--block:first-->
    <table border="1" style="color:red;">
    <tr class="lines">
        <td align="left" valign="<--valign-->">
    <b>bold</b><a href="http://www.mewsoft.com">mewsoft</a>
    <!--hello--> <--again--><!--world-->
    some html content here 1 top
    <!--block:second-->
        some html content here 2 top
        <!--block:third-->
            some html content here 3 top
            <!--block:fourth-->
                some html content here 4 top
                <!--block:fifth-->
                    some html content here 5a
                    some html content here 5b
                <!--endblock-->
            <!--endblock-->
            some html content here 3a
            some html content here 3b
        <!--endblock-->
        some html content here 2 bottom
    <!--endblock-->
    some html content here 1 bottom
<!--endblock-->
some html content here1-5 bottom base

some html content here 6-8 top base
<!--block:six-->
    some html content here 6 top
    <!--block:seven-->
        some html content here 7 top
        <!--block:eight-->
            some html content here 8a
            some html content here 8b
        <!--endblock-->
        some html content here 7 bottom
    <!--endblock-->
    some html content here 6 bottom
<!--endblock-->
some html content here 6-8 bottom base

If you uncomment the debugging statements, you'll observe the following traversal of the tokens to builds the structure that you want:

  <first>
    <second>
      <third>
        <fourth>
          <fifth>
          </fifth>
        </fourth>
      </third>
    </second>
  </first>
  <six>
    <seven>
      <eight>
      </eight>
    </seven>
  </six>

And the fully resulting data structure is:

(
    "\nsome html content here top base\n",
    {
        block   => "first",
        content => [
            "\n    <table border=\"1\" style=\"color:red;\">\n    <tr class=\"lines\">\n        <td align=\"left\" valign=\"<--valign-->\">\n    <b>bold</b><a href=\"http://www.mewsoft.com\">mewsoft</a>\n    <!--hello--> <--again--><!--world-->\n    some html content here 1 top\n    ",
            {
                block   => "second",
                content => [
                    "\n        some html content here 2 top\n        ",
                    {
                        block   => "third",
                        content => [
                            "\n            some html content here 3 top\n            ",
                            {
                                block   => "fourth",
                                content => [
                                    "\n                some html content here 4 top\n                ",
                                    {
                                        block   => "fifth",
                                        content => [
                                            "\n                    some html content here 5a\n                    some html content here 5b\n                ",
                                        ],
                                    },
                                    "\n            ",
                                ],
                            },
                            "\n            some html content here 3a\n            some html content here 3b\n        ",
                        ],
                    },
                    "\n        some html content here 2 bottom\n    ",
                ],
            },
            "\n    some html content here 1 bottom\n",
        ],
    },
    "\nsome html content here1-5 bottom base\n\nsome html content here 6-8 top base\n",
    {
        block   => "six",
        content => [
            "\n    some html content here 6 top\n    ",
            {
                block   => "seven",
                content => [
                    "\n        some html content here 7 top\n        ",
                    {
                        block   => "eight",
                        content => [
                            "\n            some html content here 8a\n            some html content here 8b\n        ",
                        ],
                    },
                    "\n        some html content here 7 bottom\n    ",
                ],
            },
            "\n    some html content here 6 bottom\n",
        ],
    },
    "\nsome html content here 6-8 bottom base",
);

Now, why is this method better?

It's less fragile. You already observed how in your previous regex was broken when other html comments were in the content. The tools used to parse here are extremely simple and so there is much less risk of the regex hiding edge cases.

Additionally, it's extremely easy to add functionality to this code. If you wanted to include parameters in your blocks, you could do it the exact same way as demonstrated in my solution to this problem of yours. The parsing and validation functionality wouldn't even have to be changed.

It reports errors Remove a character from either 'endblock' or 'block' and see what happens. It will give you an explicit error message:

Error: Unmatched start block: first at h.pl line 43

Your recursive regex would just hide the fact that there was an unmatched block in your content. You of course might observe it in your browser when you ran your code, but this way the error is reported immediately and you can track it down.

Summary:

Finally, I will say again, that the best way to solve this problem is not to try to create your own templating system, but to instead use an already created framework such as Template::Toolkit. You commented before that one of your motivations was that you wanted to use a design editor for your templates and that's why you wanted them to use html comments for the templates. However, there are ways to accommodate that desire too with existing frameworks.

Regardless, I hope that you're able to learn something from this code. Recursive regular expressions are cool tools, and great for validating data. But they should not be used for parsing, and hopefully anyone else who is searching for how to use recursive regular expressions will pause and potentially rethink their approach if they are wanting them for that reason.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top