I have an AppHarbor site where I need to do weekly updates to some of the data. I don't want to go the route of deploying an exe and adding an additional "webworker" to my site because those cost money. My thought is to add a web service/REST api service to the site so I can just call it, it will execute the "batch job stuff", then when it's complete, return a custom status code, like success or failure. Behind the scenes, it would update a "BatchLog" table or something like that, then I could create a page/view where I could access the log details and see which batch processes did or did not run.

So, that's how I'm thinking I want to implement, but I'm a little skeptical of the security around this. First of all, obviously I don't want ANYONE to be able to kick off these batch jobs just by going to my web service/rest api.

To fix that, I'm thinking there are 2 different ways to do this, or a combination of both.

1) Require some credentials and maybe an additional secret code in order to get them to actually kick off. 2) Configure in a table when each batch job can actually run, and the frequency. That way, if a hacker does call my service/rest api. It will only be able to execute 1 time per hour/day/week/month/etc. So they could hammer the service, but each successive call would just return a failure, or something like that.

One thing to note, I read someplace a couple months ago that there are cloud services out there that will "schedule" batch jobs like this for you. And the free one that I read about will do 1 service. Any more than 1 and you have to start paying for it. So for my example, I'd just create one service, and have it called multiple times per day/week, and let my BatchJob configuration table determine whether it actually needs to process or not.

So, how horrible of an idea is this? What are some other approaches to accomplishing batch jobs in a cloud environment where they don't offer batch services.

有帮助吗?

解决方案

This is quite normal and in many environments just the easiest way to do this. I have various batch jobs running that often enough don't do more than calling a REST API controller with curl (on the same machine as my web server).

For the protection part you have many options. Simple user authentication is easy enough and should be safe if coded properly (and you can use a super complex password and totally weird user name if you like, same for the URL of the task), in addition you could limit requests to certain IP addresses, if the batch is running on the same machine then limit to localhost is as secure as you can get. If you can change settings of the web server there would be even more options, if not your code should be able to do most important things anyway.

Also as you write you can limit the batch processing by querying the time and date. I do this anyway, since some jobs should not run every hour. So in my case there is only one controller called hourly and that decides what to do dependent on time. For example some heavy load image processing is done only once at night, one hour later some other heavy worker is running and during the day some simple data import runs every hour.

许可以下: CC-BY-SA归因
scroll top