DLR & Performance

https://stackoverflow.com/questions/4864244

27-10-2019
|

سؤال

I'm intending to create a web service which performs a large number of manually-specified calculations as fast as possible, and have been exploring the use of DLR.

Sorry if this is long but feel free to skim over and get the general gist.

I've been using the IronPython library as it makes the calculations very easy to specify. My works laptop gives a performance of about 400,000 calculations per second doing the following:

ScriptEngine py = Python.CreateEngine();
ScriptScope pys = py.CreateScope();

ScriptSource src = py.CreateScriptSourceFromString(@"
def result():
    res = [None]*1000000
    for i in range(0, 1000000):
        res[i] = b.GetValue() + 1
    return res
result()
");

CompiledCode compiled = src.Compile();
pys.SetVariable("b", new DynamicValue());

long start = DateTime.Now.Ticks;
var res = compiled.Execute(pys);
long end = DateTime.Now.Ticks;

Console.WriteLine("...Finished. Sample data:");

for (int i = 0; i < 10; i++)
{
    Console.WriteLine(res[i]);
}

Console.WriteLine("Took " + (end - start) / 10000 + "ms to run 1000000 times.");

Where DynamicValue is a class that returns random numbers from a pre-built array (seeded and built at run time).

When I create a DLR class to do the same thing, I get much higher performance (~10,000,000 calculations per second). The class is as follows:

class DynamicCalc : IDynamicMetaObjectProvider
{
    DynamicMetaObject IDynamicMetaObjectProvider.GetMetaObject(Expression parameter)
    {
        return new DynamicCalcMetaObject(parameter, this);
    }

    private class DynamicCalcMetaObject : DynamicMetaObject
    {
        internal DynamicCalcMetaObject(Expression parameter, DynamicCalc value) : base(parameter, BindingRestrictions.Empty, value) { }

        public override DynamicMetaObject BindInvokeMember(InvokeMemberBinder binder, DynamicMetaObject[] args)
        {
            Expression Add = Expression.Convert(Expression.Add(args[0].Expression, args[1].Expression), typeof(System.Object));
            DynamicMetaObject methodInfo = new DynamicMetaObject(Expression.Block(Add), BindingRestrictions.GetTypeRestriction(Expression, LimitType));
            return methodInfo;
        }
    }
}

and is called/tested in the same way by doing the following:

dynamic obj = new DynamicCalc();
long t1 = DateTime.Now.Ticks;
for (int i = 0; i < 10000000; i++)
{
    results[i] = obj.Add(ar1[i], ar2[i]);
}
long t2 = DateTime.Now.Ticks;

Where ar1 and ar2 are pre-built, runtime seeded arrays of random numbers.

The speed is great this way, but it's not easy to specify the calculation. I'd basically be looking at creating my own lexer & parser, whereas IronPython has everything I need already there.

I'd have thought I could get much better performance from IronPython since it is implemented on top of the DLR, and I could do with better than what I'm getting.

Is my example making best use of the IronPython engine? Is it possible to get significantly better performance out of it?

(Edit) Same as first example but with the loop in C#, setting variables and calling the python function:

ScriptSource src = py.CreateScriptSourceFromString(@"b + 1");

CompiledCode compiled = src.Compile();

double[] res = new double[1000000];

for(int i=0; i<1000000; i++)
{
    pys.SetVariable("b", args1[i]);
    res[i] = compiled.Execute(pys);
}

where pys is a ScriptScope from py, and args1 is a pre-built array of random doubles. This example executes slower than running the loop in the Python code and passing in the entire arrays.

المحلول

delnan's comment leads you to some of the problems here. But I'll just get specific about what the differences are here. In the C# version you've cut out a significant amount of the dynamic calls that you have in the Python version. For starters your loop is typed to int and it sounds like ar1 and ar2 are strongly typed arrays. So in the C# version the only dynamic operations you have are the call to obj.Add (which is 1 operation in C#) and potentially the assignment to results if it's not typed to object which seems unlikely. Also note all of this code is lock free.

In the Python version you first have the allocation of the list - this also appears to be during your timer where as in C# it doesn't look like it is. Then you have the dynamic call to range, luckily that only happens once. But that again creates a gigantic list in memory - delnan's suggestion of xrange is an improvement here. Then you have the loop counter i which is getting boxed to an object for every iteration through the loop. Then you have the call to b.GetValue() which is actually 2 dynamic invocatiosn - first a get member to get the "GetValue" method and then an invoke on that bound method object. This is again creating one new object for every iteration of the loop. Then you have the result of b.GetValue() which may be yet another value that's boxed on every iteration. Then you add 1 to that result and you have another boxing operation on every iteration. Finally you store this into your list which is yet another dynamic operation - I think this final operation needs to lock to ensure the list remains consistent (again, delnan's suggestion of using a list comprehension improves this).

So in summary during the loop we have:

                            C#       IronPython
Dynamic Operations           1           4
Allocations                  1           4
Locks Acquired               0           1

So basically Python's dynamic behavior does come at a cost vs C#. If you want the best of both worlds you can try and balance what you do in C# vs what you do in Python. For example you could write the loop in C# and have it call a delegate which is a Python function (you can do scope.GetVariable> to get a function out of the scope as a delegate). You could also consider allocating a .NET array for the results if you really need to get every last bit of performance as it may reduce working set and GC copying by not keeping around a bunch of boxed values.

To do the delegate you could have the user write:

def computeValue(value):
    return value + 1

Then in the C# code you'd do:

CompiledCode compiled = src.Compile();
compiled.Execute(pys);
var computer = pys.GetVariable<Func<object,object>>("computeValue");

Now you can do:

for (int i = 0; i < 10000000; i++)
{
    results[i] = computer(i);
}

نصائح أخرى

If you concerned about computation speed, is it better to look at lowlevel computation specification? Python and C# are high-level languages, and its implementation runtime can spend a lot of time for undercover work.

Look on this LLVM wrapper library: http://www.llvmpy.org

Install it using: pip install llvmpy ply
or on Debian Linux: apt install python-llvmpy python-ply

You still need to write some tiny compiler (you can use PLY library), and bind it with LLVM JIT calls (see LLVM Execution Engine), but this approach can be more effective (generated code much closer to real CPU code), and multiplatform comparing to .NET jail.

LLVM has ready to use optimizing compiler infrastructure, including a lot of optimizer stage modules, and big user and developer community.

Also look here: http://gmarkall.github.io/tutorials/llvm-cauldron-2016

PS: If you interested in it, I can help you with a compiler, contributing to my project's manual in parallel. But it will not be jumpstart, this theme is new to me too.

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow