Question

I am working on a program that take user input for two file names. Unfortunately, the program can easily break if the user does not follow the specified format of the input. I want to write code that improves its resiliency against these types of errors. You'll understand when you see my code:

# Ask the user for the filename of the qseq file and barcode.txt file
print "Please enter the name of the qseq file and the barcode file separated by a comma:";
# user should enter filenames like this: sample1.qseq, barcode.txt

# remove the newline from the qseq filename
chomp ($filenames = <STDIN>);

# an empty array
my @filenames;

# remove the ',' and put the files into an array separated by spaces; indexes the files
push @filename, join(' ', split(',', $filenames))

# the qseq file
my $qseq_filename = shift @filenames;

# the barcode file.
my barcode = shift @filenames;

Obviously this code runs can run into errors if the user enters the wrong type of filename (.tab file instead of .txt or .seq instead of .qseq). I want code that can do some sort of check to see that the user enters the appropriate file type.

Another error that could break the code is if the user enters too many spaces before the filenames. For example: sample1.qseq,(imagine 6 spaces here) barcode.txt (Notice the numerous spaces after the comma)

Another example: (imagine 6 spaces here) sample1.qseq,barcode.txt (This time notice the number of spaces before the first filename)

I also want lines of code that can remove extra spaces so that the program doesn't break. I think the user input has to be in the following kind of format: sample1.qseq, barcode.txt. The user input has to be in this format so that I can properly index the filenames into an array and shift them out later.

Thanks any help or suggestions are greatly appreciated!

Was it helpful?

Solution

The standard way to deal with this kind of problem is utilising command-line options, not gathering input from STDIN. Getopt::Long comes with Perl and is servicable:

use strict; use warnings FATAL => 'all';
use Getopt::Long qw(GetOptions);
my %opt;
GetOptions(\%opt, 'qseq=s', 'barcode=s') or die;
die <<"USAGE" unless exists $opt{qseq} and $opt{qseq} =~ /^sample\d[.]qseq$/ and exists $opt{barcode} and $opt{barcode} =~ /^barcode.*\.txt$/;
Usage: $0 --qseq sample1.qseq --barcode barcode.txt
       $0 -q sample1.qseq -b barcode.txt
USAGE
printf "q==<%s> b==<%s>\n", $opt{qseq}, $opt{barcode};

The shell will deal with any extraneous whitespace, try it and see. You need to do the validation of the file names, I made up something with regex in the example. Employ Pod::Usage for a fancier way to output helpful documentation to your users who are likely to get the invocation wrong.

There are dozens of more advanced Getopt modules on CPAN.

OTHER TIPS

First, put use strict; at the top of your code and declare your variables.

Second, this:

# remove the ',' and put the files into an array separated by spaces; indexes the files
push @filename, join(' ', split(',', $filenames))

Is not going to do what you want. split() takes a string and turns it into an array. Join takes a list of items and returns a string. You just want to split:

my @filenames = split(',', $filenames);

That will create an array like you expect.

This function will safely trim white space from the beginning and end of a string:

sub trim {
    my $string = shift;
    $string =~ s/^\s+//;
    $string =~ s/\s+$//;
    return $string;
}

Access it like this:

my $file = trim(shift @filenames);

Depending on your script, it might be easier to pass the strings as command line arguments. You can access them through the @ARGV array but I prefer to use GetOpt::Long:

use strict;
use Getopt::Long;
Getopt::Long::Configure("bundling");

my ($qseq_filename, $barcode);

GetOptions (
    'q|qseq=s' => \$qseq_filename,
    'b|bar=s'  => \$barcode,
);

You can then call this as:

./script.pl -q sample1.qseq -b barcode.txt

And the variables will be properly populated without a need to worry about trimming white space.

You'll need to trim spaces before handling the filename data in your routine, you could check the file extension with yet another regular expression, as nicely described in Is there a regular expression in Perl to find a file's extension?. If it's the actual type of file that matters to you, then it might be more worthwile to check for that instead with File::LibMagicType.

While I think your design is a little iffy, the following will work?

my @fileNames = split(',', $filenames);
foreach my $fileName (@fileNames) {
  if($fileName =~ /\s/) {
    print STDERR "Invalid filename.";
    exit -1;
  }
}
my ($qsec, $barcode) = @fileNames;

And here is one more way you could do it with regex (if you are reading the input from STDIN):

# read a line from STDIN
my $filenames = <STDIN>;

# parse the line with a regex or die with an error message
my ($qseq_filename, $barcode) = $filenames =~ /^\s*(\S.*?)\s*,\s*(\S.*?)\s*$/
    or die "invalid input '$filenames'";
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top