Question

I am trying to download files from a database using wget and url. E.g.

wget "http://www.rcsb.org/pdb/files/1BXS.pdb"

So format of the url is as such: http://www.rcsb.org/pdb/files/($idnumber).pdb"

But I have many files to download; so I wrote a bash script that reads id_numbers from a text file, forms url string and downloads by wget.

!/bin/bash

while read line
do
url="http://www.rcsb.org/pdb/files/$line.pdb"
echo -e $url
wget $url
done < id_numbers.txt

However, url string is formed as

.pdb://www.rcsb.org/pdb/files/4H80

So, .pdb is repleced with http. I cannot figure out why. Does anyone have an idea? How can I format it so url is

"http://www.rcsb.org/pdb/files/($idnumber).pdb"

? Thanks a lot.

Note. This question was marked as duplicate of 'How to concatenate strings in bash?' but I was actually asking for something else. I read that question before asking this one and it turns out my problem was with preparing the txt file in Windows not really string concetanation. I edited question title. I hope it is more clear now.

Was it helpful?

Solution

It sounds like your id_numbers.txt file has DOS/Windows-style line endings (carriage return followed by linefeed characters) instead of plain unix line endings (just linefeed). The result is that read thinks the line ends with a carriage return, $line actually has a carriage return at the end, and that gets embedded in the url, causing various confusion.

There are several ways to solve this. You could have bash trim the carriage return from the variable when you use it:

url="http://www.rcsb.org/pdb/files/${line%$'\r'}.pdb"

Or you could have read trim it by telling it that carriage return counts as whitespace (read will trim leading and trailing whitespace from what it reads):

while IFS=$'\r' read line

Or you could use a command like dos2unix (or whatever the equivalent is on your OS) to convert the id_numbers.txt file.

OTHER TIPS

The -e echo option is used to output the desired content without inserting a new line, you do not need it here.

Also I suspect your file containing the ids to be malformed, on which OS did you create it?

Anyway, you can simplify your script this way:

!/bin/bash

while read line
do
    wget "http://www.rcsb.org/pdb/files/$line.pdb"
done < id_numbers.txt

I was able to successfully test it with an id_numbers.txt file generated like so:

for i in $(0 9) ; do echo "$i" >> id_numbers.txt ; done

Try this:

url="http://www.rcsb.org/pdb/files/"$line
$url=$url".pdb"

For more info, check How to concatenate string variables in Bash?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top