Memory requirements when hosting R in the cloud

https://stackoverflow.com/questions/13325759

28-11-2021
|

Question

What is the minimal size server we need to run opencpu, if we expect 100,000 hits a month?

I think opencpu is an exciting project, but need to know about memory usage when opencpu is deployed, since a cloud hosting service such as rackspace charges about $40 per month for 1 GB of RAM.

I know that if I load R without doing anything or without loading any data or package in RAM, it uses almost 700m of RAM (virtual) and 50 megabytes of RAM (in residence).

I know that opencpu uses rApache, and rApache uses preforking, but want to know how this will scale as the number of concurrent users increases. Thanks.

Thanks for the responses

I talked with Jeroen Ooms when visiting LA, and am partly convinced that opencpu will work in high concurrency environments if used correctly, and that he is available to fix issues if they arrise. Opencpu related to his dissertation, after all! In particular, what I find useful about opencpu is its integration with ubuntu's AppArmor, which can restrict processes from using too much RAM and CPU. I think apache might also be able to do this, but RAppArmor can do this and much more. Brilliant! If AppArmor were the only advantage, I would just use that and json as a backend, but it seems like opencpu can also streamline the installation of all this stuff and provides a built in API system.

Given the cost of web-hosting, I imagine a workable real-time analytics system is the following:

create R statistical models on demand, on a specialized analytical server, as often as needed (e.g. every day or hour using cron)
transfer the results of the models to a directory on the opencpu servers using ftp, as native R objects
on the opencpu server, go to the directory and grab the R objects representing the statistical models, and then make predictions or do simulations using it. For example, use the 'predict' function to provide estimates based on user supplied variables.

Does anybody else see this as a viable way to make R a backend for real time analytics?

Solution

Dirk is right, it all depends on the R functions that you are calling; the overhead of the OpenCPU architecture should be quite minimal. OpenCPU itself will run on a small server, but as we all know some functionality in R requires much more memory/cpu than others.

If you really want to know how much resources are needed just to run opencpu you could do some benchmarking. As you noted, prefork is used to branch sessions of the main process, so in most cases the copy-on-write principle of forking should make it pretty cheap.

Also there is some other stuff that you can tweak; e.g. preloading of frequently used packages.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow