سؤال

I started working on a website,for tracking and rating watched anime/manga/etc. and recommendations, and it should also have an API, for providing the info about series and other things.

On similar sites, I have noticed that, to use an API, one typically needs a token/auth of sorts, and there are certain usage limits, even if it's for reading info publicly available on the site.

But the problem is, you could circumvent all those limits by crawling the site directly. Even if the format is less convenient, once you have a parser in place there's no problem. Actually, if it uses clientside rendering, the info will already be sent in a convenient format. And on the other hand, this would also put more strain on the server, because the info may be spread out on multiple pages, needing more requests, and it would also send info not required by the client app.

In the end, is there a point in restricting the API used for info that's available publicly on the site? Should there be an unrestricted, unauthed API for reading public info, in order to avoid needless blunder for both sides? Or should, instead, the site itself have request limits, like an API?

هل كانت مفيدة؟

المحلول

Having a public API for data access from your site is about making the data available in a convenient, supported, well-defined and always-up-to-date manner. It is a way for a site owner to say 'here is data I collect and own, but I want you to be able to use it so I'm making it available. Oh, and I promise not to change the structure or do anything that might break your applications without communicating about it clearly'.

Crawling has some technical limitations, some very important legal considerations AND is prone to breaking without any sort of notification from the owner of the data. Personally I would not hesitate to consume a public JSON API if that has data I need, but I'd be hard pressed to start writing a crawler/parser to get it off a website...

نصائح أخرى

Token-authorized APIs are used for lots of reasons.

  • Controlling access to restricted information. Since a token usually maps somehow to an authorized identity, the site can restrict programmatic access to data exactly as it would to interactive access via the web UI. Tokens also prevent passwords from being disclosed in requests, and are usually revocable by both the user and the site.
  • Rate-limiting individual consumers. Since programs can easily present load many times higher and faster than people, being able to prevent overloading by an overly-aggressive consumer can make the difference between a site being up or down.
  • Controlling errant consumers. Some sites issue tokens to the program accessing the site, in addition to, or instead of, the user. This allows the site to make decisions such as blocking programs that abuse the API, that produce too many invalid requests, etc.
  • Improving usage statistics. Interactive usage is usually discussed both in terms of page views and sessions. API requests are the equivalent of page views, but there is nothing like an API session (unless the site allows persistent idle HTTP sessions, but that's uncommon). Tokens allow the site to group requests for the same user, and with some time-boxing, that can be a reasonable proxy for a session.
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى softwareengineering.stackexchange
scroll top