Question

I am very new to Perl, and I am trying to write a word frequency counter as a learning exercise.

However, I am not able to figure out the error in my code below, after working on it. This is my code:

$wa = "A word frequency counter."; 
@wordArray = split("",$wa);
$num = length($wa);
$word = "";
$flag = 1; # 0 if previous character was an alphabet and 1 if it was a blank.
%wordCount = ("null" => 0);
if ($num == -1) {
    print "There are no words.\n";
} else {
    print "$length";
    for $i (0 .. $num) {
        if(($wordArray[$i]!=' ') && ($flag==1)) { # start of a new word.
            print "here";
            $word = $wordArray[$i];
            $flag = 0;
        } elsif ($wordArray[$i]!=' ' && $flag==0) { # continuation of a word.
            $word = $word . $wordArray[$i];
        } elsif ($wordArray[$i]==' '&& $flag==0) { # end of a word.
            $word = $word . $wordArray[$i];
            $flag = 1;
            $wordCount{$word}++;
            print "\nword: $word";
        } elsif ($wordArray[$i]==" " && $flag==1) { # series of blanks.
            # do nothing.
        }
    }
    for $i (keys %wordCount) {
        print " \nword: $i - count: $wordCount{$i} ";
    }
}

It's neither printing "here", nor the words. I am not worried about optimization at this point, though any input in that direction would also be much appreciated.

Was it helpful?

Solution

Fist off,

$wordArray[$i]!=' '

should be

$wordArray[$i] ne ' '

according to the Perl documentation for comparing strings and characters. Basically use numeric operators (==, >=, …) for numbers, and string operators for text (eq, ne, lt, …).

Also, you could do

@wordArray = split(" ",$wa);

instead of

@wordArray = split("",$wa);

and then @wordArray wouldn't need to do the wonky character checking and you never would have had the problem. @wordArray will be split into the words already and you'll just have to count the occurrences.

OTHER TIPS

This is a good example of a problem where Perl will help you work out what's wrong if you just ask it for help. Get used to always adding the lines:

use strict;
use warnings;

to the top of your Perl programs.

You seem to be writing C in Perl. The difference is not just one of style. By exploding a string into a an array of individual characters, you cause the memory footprint of your script to explode as well.

Also, you need to think about what constitutes a word. Below, I am not suggesting that any \w+ is a word, rather pointing out the difference between \S+ and \w+.

#!/usr/bin/env perl

use strict; use warnings;
use YAML;

my $src = '$wa = "A word frequency counter.";';

print Dump count_words(\$src, 'w');
print Dump count_words(\$src, 'S');

sub count_words {
    my $src = shift;
    my $class = sprintf '\%s+', shift;
    my %counts;

    while ($$src =~ /(?<sequence> $class)/gx) {
        $counts{ $+{sequence} } += 1;
    }

    return \%counts;
}

Output:

---
A: 1
counter: 1
frequency: 1
wa: 1
word: 1
---
'"A': 1
$wa: 1
=: 1
counter.";: 1
frequency: 1
word: 1
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top