Question

This is more of a efficiency problem when calling r functions using rpy2 from multithreads.

The task of the r functions basically load a model file from disk and use the model to classify time series. However collecting the input time series is done using python by polling from the database (which will be updated by some web services). Once the python code detect a new time serial it will create a worker process, where rpy2 is used to call r functions to do the classification task.

Let python do the classification task is NOT an option for us. My main concern is the overhead when loading the model file. Clearly I do NOT want the file being read once each time a new time serial is classified. So the question is -

How can I load the model file just once, and the in-memory model object can be re-used when the same r function being called though rpy2?

My initial intention is load the model file into python and pass it as parameter each time the r function is called. But this will introduce extra cost of copying the model parameters (say the size is not negligible).

Your help will be very appreciated!

Was it helpful?

Solution

If I understand it correctly, you:

  1. have a function (classifier) written in R that requires a relatively large body of data to work (k nearest neighbors ?)
  2. are loading that body of data using Python
  3. would like to load the parameters /once/ and after that make as many calls to the classifier as required
  4. plan passing the body of data as a parameter for the classifier

If following 4., copying is not always necessary but currently only if the data is numerical, or boolean, and the memory region is allocated by R.

However, I think that a simpler alternative for that situation is to have the body of data passed to R once for all (and copied if necessary) and use that converted object.

from rpy2.robjects.packages import importr
e1071 = importr('e1071')

from rpy2.robjects.conversion import py2ri

# your model's data are in 'm_data'
# here conversion is happening
r_m_data = py2ri(m_data)

for test_data in many_test_data:
    # r_m_data is already a pointer to an R data structure
    # (it was converted above - no further copying is made)
    res = e1071.knn(r_m_data, test_data)

This will correspond to what you describe as:

How can I load the model file just once, and the in-memory model object can be re-used when the same r function being called though rpy2?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top