Which language is best for this editoral and op-ed aggregator project?
-
12-09-2019 - |
Question
I'm looking for an aggregator for the editoral and op-ed pages of a bunch of English language newspapers I want to follow. The objective is to generate an HTML that is just a collection of editorial pieces from the dozen newspapers I want to follow internationally, so that I can print them off in the morning. Since this is a very narrow requirement, I couldn't find anything already available so I'm thinking of writing one on my own.
Now, I used to be a programmer for ~8 years in my previous life (and now have been swayed to the "Dark Side" that is Wall Street after my MBA). I'm not knowledgeable enough today about programming to make a good choice on a scripting language so am unsure which the best language for this would be (performance is not a key issue, libraries for parsing HTML, text handling as well as getting data off live web pages are more important).
PS: I don't mind learning a new language (previously I worked extensively with x86 ASM, C and Visual C++/MFC) almost exclusively in Win32 environments.
Solution
Use Python and the excellent lxml library for scraping HTML. It supports CSS selectors, which is a huge convenience, and it's rather fast. It handles broken HTML well too.
OTHER TIPS
interpreted languages do well with code generation, you should think about Perl or Ruby