You can use perl to do copy/paste for you, first argument *.fastq
are all fastq files, and second ./Edited_sequences
is target folder for new files,
perl -e '$d=pop; `head -8000 "$_" > "$d/$_"` for @ARGV' *.fastq ./Edited_sequences
Question
I currently have an issue with reading files in one directory. I need to take all the fastq files in a file and run the script for each file then put new files in an ‘Edited_sequences’ folder. The one script I had is
perl -ne '$i++; if($i<80001){print}' BM2003_TCCCAGAACAAC_L001_R1_001.fastq > ./Edited_sequences/BM2003_TCCCAGAACAAC_L001_R1_001.fastq
It takes the first 80000 lines in one fastq file then outputs the result. Now for example I have 2000 fastq files, then I need to copy and paste for 2000 times. I know there is a glob command suit for this situation but I just do not know how to deal with that. Please help me out.
Solution
You can use perl to do copy/paste for you, first argument *.fastq
are all fastq files, and second ./Edited_sequences
is target folder for new files,
perl -e '$d=pop; `head -8000 "$_" > "$d/$_"` for @ARGV' *.fastq ./Edited_sequences
OTHER TIPS
glob
gets you an array of filenames matching a particular expression. It's frequently used with <>
brackets, a lot like reading input (you can think of it as reading files from a directory).
This is a simple example that will print the names of every ".fastq" file in the current directory:
print "$_\n" for <*.fastq>;
The important part is <*.fastq>
, which gives us an array of filenames matching that expression (in this case, a file extension). If you need to change which directory your Perl script is working in, you can use chdir
.
From there, we can process your files as needed:
while (my $filename = <*.fastq>) {
open(my $in, '<', $filename) or die $!;
open(my $out, '>', "./Edited_sequences/$filename") or die $!;
for (1..80000) {
my $line = <$in>;
print $out $line;
}
}
You have two choices:
Here's the bash alternative:
for file in *.fastq
do
perl -ne '$i++; if($i<80001){print}' "$file" > "./Edited_sequences/$file"
done
Your same Perl script, but with the shell finding each file. This should work and not overload the command line. The for
loop in bash, if handed a glob can expand them correctly.
However, I always recommend that you don't actually execute the command, but echo the resulting commands into a file:
for file in *.fastq
do
echo "perl -ne '\$i++; if(\$i<80001){print}' \
\"$file\" > \"./Edited_sequences/$file\"" >> myoutput.txt
done
Then, you can look at myoutput.txt
to make sure it looks good before you actually do any real harm. Once you've determined that myoutput.txt
is a good file, you can execute that as a shell script:
$ bash myoutput.txt