Question

I'm writing a script to rearrange html content and I'm stuck with 2 problems. I have this html structure, which is movie titles and release years with thumbnails grouped in 5 columns. I want to generate new html files with the movies grouped by decades from 2011 to 1911, e.g. present-2011; 2010-2001; 2000-1991; etc.

<table>
    <tr>
      <td class="basic" valign="top">
        <a href="details/267226.html" title="" id="thumbimage">
          <img src="images/267226f.jpg"/>
        </a>
        <br/>Cowboys &amp; Aliens &#160;(2011)
</td>
      <td class="basic" valign="top">
        <a href="details/267185.html" title="" id="thumbimage">
          <img src="images/267185f.jpg"/>
        </a>
        <br/>The Hangover Part II &#160;(2011)
</td>
      <td class="basic" valign="top">
        <a href="details/267138.html" title="" id="thumbimage">
          <img src="images/267138f.jpg"/>
        </a>
        <br/>Friends With Benefits &#160;(2011)
</td>
      <td class="basic" valign="top">
        <a href="details/266870.html" title="" id="thumbimage">
          <img src="images/266870f.jpg"/>
        </a>
        <br/>Beauty And The Beast &#160;(1991)
</td>
      <td class="basic" valign="top">
        <a href="details/266846.html" title="" id="thumbimage">
          <img src="images/266846f.jpg"/>
        </a>
        <br/>The Fox And The Hound &#160;(1981)
</td>
    </tr>


......

</table>

The one problem I have no idea how to solve is that after removing movies not matching the decade I'm left with empty 'tr' tags and thumbnail positions and don't know how to rearrange again every row in 5 columns filled with 5 titles. And also how to process each decade with one call of the script. Thanks.

use autodie;
use strict;
use warnings;
use File::Slurp;
use HTML::TreeBuilder;    

my $tree = HTML::TreeBuilder->new_from_file( 'test.html' );

for my $h ( $tree->look_down( class => 'basic' ) ) {

    edit_links( $h );      

    my ($year) = ($h->as_text =~ /.*?\((\d+)\).*/);
    if ($year > 2010 or $year < 2001) {
        $h->detach;
        write_file( "decades/2010-2001.html", \$tree->as_HTML('<>&',' ',{}), "\n" );
    }
}    

sub edit_links {
    my $h = shift;

    for my $link ( $h->find_by_tag_name( 'a' ) ) {
        my $href = '../'.$link->attr( 'href' );
        $link->attr( 'href', $href );
    }

    for my $link ( $h->find_by_tag_name( 'img' ) ) {
        my $src = '../'.$link->attr( 'src' );
        $link->attr( 'src', $src );
    }
}
Was it helpful?

Solution

The approach below should do what you wanted in question. During the HTML file processing, the hash %decade is setup, each key being ending year of decade and value arrayref of appropriate cells.

Second loop traverses the hash and outputs file for each decade, surrounding each 5 cells with <tr> tag.

use strict;
use HTML::TreeBuilder;
use File::Slurp;
use List::MoreUtils qw(part);

my $tree = HTML::TreeBuilder->new_from_file('test.html');

my %decade = ();

for my $h ( $tree->look_down( class => 'basic' ) ) {

    edit_links( $h );

    my ($year) = ($h->as_text =~ /.*?\((\d+)\).*/);
    my $dec = (int($year/10) + 1)  * 10;

    $decade{$dec} ||= [];
    push @{$decade{$dec}}, $h;
}

for my $dec (sort { $b <=> $a } keys %decade) {
    my $filename = "decades/" . $dec . "-" . ($dec - 9) . ".html";

    my $idx = 0;
    my @items = map { $_->as_HTML('<>&',' ',{}) } @{ $decade{$dec} };
    my $contents = join('',
        '<table>',
        (map { "<tr>@$_</tr>" } part { int($idx++ / 5) } @items),
        '</table>');

    write_file( $filename, $contents);
}

...
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top