Pregunta

To begin with I want to get all the followers of my twitter account. I did a little research and found that we can do web scraping with Ruby on rails using : Nokogiri or Mechanize gems. I also got a css selector to use for the web scraping. Now the HTML page source does not show all the followers of the account if I look it up.

Can I really use web scraping code to fetch all my twitter followers or Should I go for Twitter API?

¿Fue útil?

Solución

In general terms, absolutely use APIs whenever possible.

As the name implies, with "scraping" you are merely dealing with the "surface" of the application, in MVC terms its (HTML) views. Those views can change at any instant -- think how many times Twitter and other similar services undergo site redesigns. If you are scraping, then each site redesign, even a minor one, will very likely break your existing code, forcing you (without warning) to make frantic updates based on guesswork.

Nokogiri and Mechanize are powerful tools, but they will never compare with the functionality, stability and consistency of an API, which accesses database content directly, bypassing the ever-changing "surface" altogether. In the case of Twitter, you have the added benefit of API wrappers such as the Twitter gem for accessing the API, which add a user-friendly layer to the API making it yet easier to integrate into your application.

So to sum up: use the API, possibly via an API wrapper such as the Twitter gem.

Otros consejos

Web scraping is normally a last resort when a service doesn't provide an API or the API doesn't sufficiently provide all of the functionality that you require.

I would look into the API first. This is what it is designed for.

Web scraping can be problematic as the structure of the website could change drastically and break your code from functioning.

Generally an public API tends to have some sort of contract that there will not be dramatic changes to the data that is provided. If there is changes, the API will provide versioning of the API (ability to call the old version of the API) or documentation that provides information on what will change and when they will happen.

Also, web scraping has other costs like extra bandwidth. The data you get from an API is normally more useful in an application.

There are also quite a few libraries out there (ruby gems) that will provide a lot of the basic functionality you need to access the API you require. They are also generally updated when the API is updated.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top