Question

Preface: I have a broad, college knowledge, of a handful of languages (C++, VB,C#,Java, many web languages), so go with which ever you like.

I want to make an android app that compares numbers, but in order to do that I need a database. I'm a one man team, and the numbers get updated biweekly so I want to grab those numbers off of a wiki that gets updated as well.

So my question is: how can I access information from a website using one of the languages above?

Was it helpful?

Solution

What I understand the problem to be: Some entity generates a data set (i.e. numbers) every other week and you have a need to download that data set for treatment (e.g. sorting).

Ideally, the web site maintaining the wiki would provide a Service, like a RESTful interface, to easily gather the data. If that were the case, I'd go with any language that provides easy manipulation of HTTP request & response, and makes your data manipulation easy. As a previous poster said, Java would work well.

If you are stuck with the wiki page, you have a couple of options. You can parse the HTML your browser receives (Perl comes to mind as a decent language for that). Or you can use tools built for that purpose such as the aforementioned Jsoup.

Your question also mentions some implementation details such as needing a database. Evidently, there isn't enough contextual information for me to know whether that's optimal, so I won't address this aspect of the problem.

OTHER TIPS

http://jsoup.org/ is a great Java tool for accessing content on html pages

Consider https://scraperwiki.com/ - it's a site where users can contribute scrapers. It's free as long as you let your scraper be public. The results of your scraper are exposed as csv and JSON.

If you don't know what a "scraper" is, google "screen scraping" - it's a long and frustrating tradition for coders, who have dealt with the same problem you have since the beginning of networked computing.

You could check out :http://web-harvest.sourceforge.net/

For Python, BeautifulSoup is one of the most tolerant HTML parsers out there. The documentation also lists similar libraries in Ruby and Java, so you'll probably find something relevant there.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top