Read same extension multiple files in one directory in Perl

Question 1

You can use perl to do copy/paste for you, first argument *.fastq are all fastq files, and second ./Edited_sequences is target folder for new files,

perl -e '$d=pop; `head -8000 "$_" > "$d/$_"` for @ARGV' *.fastq ./Edited_sequences

Question 2

glob gets you an array of filenames matching a particular expression. It's frequently used with <> brackets, a lot like reading input (you can think of it as reading files from a directory).

This is a simple example that will print the names of every ".fastq" file in the current directory:

print "$_\n" for <*.fastq>;

The important part is <*.fastq>, which gives us an array of filenames matching that expression (in this case, a file extension). If you need to change which directory your Perl script is working in, you can use chdir.

From there, we can process your files as needed:

while (my $filename = <*.fastq>) {
    open(my $in, '<', $filename) or die $!;
    open(my $out, '>', "./Edited_sequences/$filename") or die $!;

    for (1..80000) {
        my $line = <$in>;
        print $out $line;
    }
}

Question 3

You have two choices:

Use Perl to read in the 2000 files and run it as part of your program
Use the Shell to pass each of those 2000 file to your command line

Here's the bash alternative:

for file in *.fastq
do
    perl -ne '$i++; if($i<80001){print}' "$file" > "./Edited_sequences/$file"
done

Your same Perl script, but with the shell finding each file. This should work and not overload the command line. The for loop in bash, if handed a glob can expand them correctly.

However, I always recommend that you don't actually execute the command, but echo the resulting commands into a file:

for file in *.fastq
do
    echo "perl -ne '\$i++; if(\$i<80001){print}' \
\"$file\" > \"./Edited_sequences/$file\""    >> myoutput.txt
done

Then, you can look at myoutput.txt to make sure it looks good before you actually do any real harm. Once you've determined that myoutput.txt is a good file, you can execute that as a shell script:

$ bash myoutput.txt