Question

I need to extract the exchange rate of USD to another currency (say, EUR) for a long list of historical dates.

The www.xe.com website gives the historical lookup tool, and using a detailed URL, one can get the rate table for a specific date, w/o populating the Date: and From: boxes. For example, the URL http://www.xe.com/currencytables/?from=USD&date=2012-10-15 gives the table of conversion rates from USD to other currencies on the day of Oct. 15th, 2012.

Now, assume I have a list of dates, I can loop through the list and change the date part of that URL to get the required page. If I can extract the rates list, then simple grep EUR will give me the relevant exchange rate (I can use awk to specifically extract the rate).

The question is, how can I get the page(s) using Linux command line command? I tried wget but it did not do the job.

If not CLI, is there an easy and straight forward way to programmatically do that (i.e., will require less time than do copy-paste of the dates to the browser's address bar)?


UPDATE 1:

When running:

$ wget 'http://www.xe.com/currencytables/?from=USD&date=2012-10-15'

I get a file which contain:

<HTML>
<HEAD><TITLE>Autoextraction Prohibited</TITLE></HEAD>
<BODY>
Automated extraction of our content is prohibited.  See <A HREF="http://www.xe.com/errors/noautoextract.htm">http://www.xe.com/errors/noautoextract.htm</A>.
</BODY>
</HTML>

so it seems like the server can identify the type of query and blocks the wget. Any way around this?


UPDATE 2:

After reading the response from the wget command and the comments/answers, I checked the ToS of the website and found this clause:

You agree that you shall not:
...
f. use any automatic or manual process to collect, harvest, gather, or extract
   information about other visitors to or users of the Services, or otherwise
   systematically extract data or data fields, including without limitation any
   financial and/or currency data or e-mail addresses;

which, I guess, concludes the efforts in this front.


Now, for my curiosity, if wget generates an HTTP request, how does the server know that it was a command and not a browser request?

Was it helpful?

Solution 2

That's because wget is sending a certain types of headers that makes it easy to detect.

# wget --debug cnet.com | less
[...]
---request begin---
GET / HTTP/1.1
User-Agent: Wget/1.13.4 (linux-gnu)
Accept: */*
Host: www.cnet.com
Connection: Keep-Alive
[...]

Notice the

User-Agent: Wget/1.13.4 

I think that if you change that for

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/537.75.14

It would work.

# wget --header='User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/537.75.14' 'http://www.xe.com/currencytables/?from=USD&date=2012-10-15'

That seems to be working fine from here. :D

OTHER TIPS

You need to use -O to write the STDOUT

wget -O- http://www.xe.com/currencytables/?from=USD&date=2012-10-15

But it looks like xe.com does not want you to do automated downloads. I would suggest not doing automated downloads at xe.com

Did you visit the link in the response?

From http://www.xe.com/errors/noautoextract.htm:

We do offer a number of licensing options which allow you to incorporate XE.com currency functionality into your software, websites, and services. For more information, contact us at:

XE.com Licensing
+1 416 214-5606
licensing@xe.com

You will appreciate that the time, effort and expense we put into creating and maintaining our site is considerable. Our services and data is proprietary, and the result of many years of hard work. Unauthorized use of our services, even as a result of a simple mistake or failure to read the terms of use, is unacceptable.

This sounds like there is an API that you could use but you will have to pay for it. Needless to say, you should respect these terms, not try to get around them.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top