rpyc.Service takes 10 seconds to receive a 150kB object (on localhost, no LAN issue)

https://stackoverflow.com/questions/14463643

python
rpyc

17-01-2022
|

Domanda

I am building a big (150kB when pickled) dummy dictionary and running a dummy function on it that runs quickly and smoothly.

When the same function is exposed via a rpyc.Service, the time taken becomes 10 seconds (instead of 0.0009 seconds), even if my client and server stand on the same host (no issue with the LAN latency here).

Any idea why it takes so long for my 150kB object to be communicated from the client to the server on the same host?

And why the function dummy.dummy() is called even if the input object is not yet "available" (if it were, then the time spent in the function would be the same in the two test cases)?

Cf my python (3.2) code below. I measure the time spent in dummy.dummy(d).

Case 1: dummy.dummy is called by the client ; exec time = 0.0009
Case 2: dummy.dummy is called the rpyc service ; exec time = 10 seconds

mini_service.py

import rpyc
from rpyc.utils.server import ThreadedServer
import dummy

class miniService(rpyc.Service):
    def exposed_myfunc(self,d):
        #Test case 2: call dummy.dummy from the service
        dummy.dummy(d)

if __name__=='__main__':
    t = ThreadedServer(miniService,protocol_config = {"allow_public_attrs" : True}, port = 19865)
    t.start()

mini_client.py

import rpyc
import sys
import pickle
import dummy

def makedict(n):
    d={x:x for x in range(n)}
    return d

if __name__ == "__main__":
    d=makedict(20000)
    print(sys.getsizeof(d))             #result = 393356

#   output = open("C:\\rd\\non_mc_test_files\\mini.pkl",'wb') #117kB object for n=20k
#   pickle.dump(d,output)
#   output.close()

#RUN1 : dummy.dummy(d) out of rpyc takes 0.00099 seconds
#   dummy.dummy(d)

#RUN2 : dummy.dummy(d) via RPYC on localhost takes 9.346 seconds
    conn=rpyc.connect('localhost',19865,config={"allow_pickle":True})
    conn.root.myfunc(d)

    print('Done.')

dummy.py

import time

def dummy(d):
    start_ = time.time()
    for key in d:
        d[key]=0
    print('Time spent in dummy in seconds: ' + str(time.time()-start_))

Soluzione

It looks like the performance loss comes from the work done by rpyc to keep the object (passed by reference) synchronized between the client and the server.

What I am now doing in my application is to make a deep copy of the input object, and then work on the copy, thus emulating a passing by value mechanism.

Note: the deep copying requires to have allow_picke=True set in the protocol configuration parameters.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow