Question

I'm trying to identify list data the code is:

my $listdata = '
List Items:     
(1)LIST 1 data 
(a)sub data
(b)sub data
(c)sub data
(d)sub data
    (i)sub-sub data
    (ii)sub-sub data
        (A)sub-sub-sub data
        (B)sub-sub-sub data
    (iii)sub-sub data
(e)sub data
(2)LIST 2 data 
(3)LIST 3 data 
';

    #print "\n\n\n$listdata\n\n";

    ###Array of multi-level patterns 
    my @level_check =('\(\d+\)','(?<!\()\d+\)','\([a-h]\)','(?<!\()[a-h]\)','\([A-H]\)','(?<!\()[A-H]\)','\d+\.',
                      '\([IVX]+\)','(?<!\()[IVX]+\)','\([ivx]+\)','(?<!\()[ivx]+\)','\-');

    ###pattern for each levels
    my ($first_level,$second_level,$third_level,$fourth_level);

    ###First from each pattern
    my ($first_occur,$second_occur,$third_occur,$fourth_occur);

    #++++++++++++++++++++++++Pattern for multilevel list+++++++++++++++++++++++#
    my $pattern = '((?:[IVX\-\(\)\d\.\-][a-z]?\)?)+)';

    $listdata =~ s{$pattern}{
        my ($leveltemp) = ($1);
        $first_occur = $leveltemp if !$first_occur;

        #print "$data";
        #print "all_level: $leveltemp##\n";

        #########First Level Start
        for($i=0; $i<scalar(@level_check);$i++){
            if($first_occur =~ /^$level_check[$i]$/){
                $first_level = $level_check[$i] if !$first_level;
                #print "$level_check[$i] =>is Ist: $first_level\n";
            }
        }

        for($i=0; $i<scalar(@level_check);$i++){

            if($leveltemp =~ /^$first_level$/){
                $leveltemp =~ s{$pattern}{<<LIST1>>$2$3};
                #print"**$data level matched: $leveltemp => $first_level\n";
                ############First Level End
            }
            else
            {
                ######Second level Start
                if($leveltemp !~ /^(?:<<LIST\d+>>|\d{3,}\,?|\([a-h]{3,})/i){
                    $second_occur = $leveltemp if !$second_occur;
                    #print "$leveltemp :$second_occur\n";

                    for($i=0; $i<scalar(@level_check);$i++){
                        if($second_occur =~ /^$level_check[$i]$/){
                        $second_level = $level_check[$i] if !$second_level;
                        #print "$leveltemp =>is IInd: $second_level\n";
                        }
                    }

                    if($leveltemp =~ /^$second_level/){
                        $leveltemp =~ s{$pattern}{<<LIST2>>$2$3};
                        #print"**level matched: $leveltemp => $seconf_level\n";
                        ######Second level End
                    }
                    else
                    {
                        ########Third Level Start   
                        if($leveltemp !~ /^(?:<<LIST\d+>>|\d{3,}\,?|\([A-h]{3,})/i){
                            $third_occur = $leveltemp if !$third_occur;

                            for($i=0; $i<scalar(@level_check);$i++){
                                if($third_occur =~ /^$level_check[$i]$/){
                                    $third_level = $level_check[$i] if !$third_level;
                                    #print "$leveltemp =>is IIIrd: $third_level\n";
                                }
                            }

                            if($leveltemp =~ /^$third_level/){
                                $leveltemp =~ s{$pattern}{<<LIST3>>$2$3};
                                #print"**level matched: $leveltemp => $third_level\n";
                            #########Third Level End
                            }
                            else
                            {
                                ########Fourth Level Start  
                                if($leveltemp !~ /^(?:<<LIST+>>|\d{3,}\,?|\([A-z]{3,})/i){

                                    $fourth_occur = $leveltemp if !$fourth_occur;
                                        #print "$leveltemp :$fourth_occur\n";
                                    for($i=0; $i<scalar(@level_check);$i++){
                                        if($fourth_occur =~ /^$level_check[$i]$/){
                                            $fourth_level = $level_check[$i] if !$fourth_level;
                                            #print "$leveltemp =>is IVrth: $fourth_level\n";
                                        }
                                    }

                                    if($leveltemp =~ /^$fourth_level/){
                                        $leveltemp =~ s{$pattern}{<<LIST4>>$2$3};
                                        #print"**$fourth_occur  level matched: $leveltemp => $fourth_level\n";
                                        #########Fourth Level End
                                    }
                                    #######Add Next Levels Here If Any in else loop


                                }
                            }#IV lvl else loop end
                        }   
                    }#III lvl else loop end
                }
            }#IInd lvl else loop end

        }#Ist lvl for loop end

        "$leveltemp"
    }gsixe;

print "$listdata\n";

The Output Required:

 <<LIST1>>(1)LIST 1 data 
 <<LIST2>>(a)sub data
 <<LIST2>>(b)sub data
 <<LIST2>>(c)sub data
 <<LIST2>>(d)sub data
 <<LIST3>>(i)sub-sub data
 <<LIST3>>(ii)sub-sub data
 <<LIST4>>(A)sub-sub-sub data
 <<LIST4>>(B)sub-sub-sub data
 <<LIST3>>(iii)sub-sub data
 <<LIST2>>(e)sub data
 <<LIST1>>(2)LIST 2 data 
 <<LIST1>>(3)LIST 3 data

Problem is that I have to enter code for each level. I coded upto four levels here. But this is not the solution(List may have any number of sub levels). Is there any other way to write short code for this which covers all possible sub levels of the list. Again the list is Dynamic. List can be start in any of the follwing format= A) (A) 1. 1) (1) a) (a) i) (i).

Was it helpful?

Solution

Use a stack to keep track of "open" styles in order to determine if a new style is a child or a parent.

use strict;
use warnings;

my @styles = (
    '\(\d+\)',     '\d+\)',     '\d+\.',
    '\([a-h]\)',   '[a-h]\)',   '\([A-H]\)',   '[A-H]\)',
    '\([IVX]+\)',  '[IVX]+\)',  '\([ivx]+\)',  '[ivx]+\)',
    '-',
);

my @stack;
while (<>) {
   for my $i (reverse 0..$#stack) {
      if (/$stack[$i]/) {
         splice(@stack, $i+1);
         goto DONE_LINE;
      }
   }

   for my $style (@styles) {
      if (my ($spaces) = /^( *)$style/) {
         push @stack, qr/^$spaces$style/;
         goto DONE_LINE;
      }
   }

   die "Unrecognized format at line $. - $_";

DONE_LINE:
   s/^ *//;
   printf("<<LIST%d>>%s", 0+@stack, $_);
}

To avoid recompiling the same regex patterns over and over again, add

my %re_cache = map { $_ => qr/^( *)$_/ } @styles;

and change

/^( *)$style/

to

/$re_cache{$style}/

OTHER TIPS

Try processing line by line. The following identifies which level each of the list items is at. One would simply need to keep track of the previous level to determine if something is a child, and the max value for each previous level to validate that things are in the correct order:

use strict;
use warnings;


###Array of multi-level patterns 
my @level_check = (
    '\(\d+\)',
    '(?<!\()\d+\)',
    '\([a-h]\)',
    '(?<!\()[a-h]\)',
    '\([A-H]\)',
    '(?<!\()[A-H]\)',
    '\d+\.',
    '\([IVX]+\)',
    '(?<!\()[IVX]+\)',
    '\([ivx]+\)',
    '(?<!\()[ivx]+\)',
    '\-',
);

while (<DATA>) {
    chomp(my $line = $_);

    my $match = 0;
    for my $i (0..$#level_check) {
        if ($line =~ /^\s*$level_check[$i]/) {
            $match = $i + 1;
            last;
        }
    }

    if ($match) {
        print "Level $match - $line\n";
    } else {
        print "No Match - $line\n";
    }
}

1;

__END__
(1)LIST 1 data 
(a)sub data
(b)sub data
(c)sub data
(d)sub data
    (i)sub-sub data
    (ii)sub-sub data
        (A)sub-sub-sub data
        (B)sub-sub-sub data
    (iii)sub-sub data
(e)sub data
(2)LIST 2 data 
(3)LIST 3 data 

prints

Level 1 - (1)LIST 1 data
Level 3 - (a)sub data
Level 3 - (b)sub data
Level 3 - (c)sub data
Level 3 - (d)sub data
Level 10 -     (i)sub-sub data
Level 10 -     (ii)sub-sub data
Level 5 -         (A)sub-sub-sub data
Level 5 -         (B)sub-sub-sub data
Level 10 -     (iii)sub-sub data
Level 3 - (e)sub data
Level 1 - (2)LIST 2 data
Level 1 - (3)LIST 3 data
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top