سؤال

I am currently serializing SQL table rows into a binary format for efficient storage. I serialize/deserialize the binary data into a List<object> per row. I'm trying to upgrade this to use POCOs, that will be dynamically generated (emitted) with one Field per column.

I've been searching online for hours and have stumbled upon ORMs/frameworks like EF, T4, ExpandoObject, but all of these either use a dynamic object (properties can be added/removed on the fly) or simply generate a POCO before compiling. I cannot use templating because the schema of the tables is unknown at compile time, and using dynamic objects would be overkill (and slow) since I know the exact set of properties and their types. I need to generate one POCO per table, with Fields corresponding to columns, and with the data types set accordingly (INT -> int, TEXT -> string).

After generating the POCO, I'll proceed to get/set properties using emitted CIL, much like what PetaPoco does for statically compiled POCOs. I'm hoping all of this rigmarole will be faster than using untyped Lists, and give me high-fidelity POCOs that are strongly-typed and can be accelerated by the CLR. Am I correct to assume this? and can you start me off on generating POCOs at runtime? And will using POCOs be much faster or much more memory-efficient than using a List<object>? Basically, will it be worth the trouble? I already know how to accelerate getting/setting Fields using emitted CIL.

هل كانت مفيدة؟

المحلول

From comments and chat, it seems that a key part of this is still creating a dynamic type; ok, here's a full example that shows a fully serializable (by any common serializer) type. You could of course add more to the type - maybe indexers to get properties by number or by name, INotifyPropertyChanged, etc.

Also - critical point: you must cache and re-use the generated Type instances. Do not keep regenerating this stuff... you will hemorrhage memory.

using Newtonsoft.Json;
using ProtoBuf;
using System;
using System.IO;
using System.Reflection;
using System.Reflection.Emit;
using System.Runtime.Serialization;
using System.Runtime.Serialization.Formatters.Binary;
using System.Text;
using System.Xml.Serialization;

public interface IBasicRecord
{
    object this[int field] { get; set; }
}
class Program
{
    static void Main()
    {
        object o = 1;
        int foo = (int)o;
        string[] names = { "Id", "Name", "Size", "When" };
        Type[] types = { typeof(int), typeof(string), typeof(float), typeof(DateTime?) };

        var asm = AppDomain.CurrentDomain.DefineDynamicAssembly(
            new AssemblyName("DynamicStuff"),
            AssemblyBuilderAccess.Run);
        var module = asm.DefineDynamicModule("DynamicStuff");
        var tb = module.DefineType("MyType", TypeAttributes.Public | TypeAttributes.Serializable);
        tb.SetCustomAttribute(new CustomAttributeBuilder(
            typeof(DataContractAttribute).GetConstructor(Type.EmptyTypes), new object[0]));
        tb.AddInterfaceImplementation(typeof(IBasicRecord));

        FieldBuilder[] fields = new FieldBuilder[names.Length];
        var dataMemberCtor = typeof(DataMemberAttribute).GetConstructor(Type.EmptyTypes);
        var dataMemberProps = new[] { typeof(DataMemberAttribute).GetProperty("Order") };
        for (int i = 0; i < fields.Length; i++)
        {
            var field = fields[i] = tb.DefineField("_" + names[i],
                types[i], FieldAttributes.Private);

            var prop = tb.DefineProperty(names[i], PropertyAttributes.None,
                types[i], Type.EmptyTypes);
            var getter = tb.DefineMethod("get_" + names[i],
                MethodAttributes.Public | MethodAttributes.HideBySig, types[i], Type.EmptyTypes);
            prop.SetGetMethod(getter);
            var il = getter.GetILGenerator();
            il.Emit(OpCodes.Ldarg_0); // this
            il.Emit(OpCodes.Ldfld, field); // .Foo
            il.Emit(OpCodes.Ret); // return
            var setter = tb.DefineMethod("set_" + names[i],
                MethodAttributes.Public | MethodAttributes.HideBySig, typeof(void), new Type[] { types[i] });
            prop.SetSetMethod(setter);
            il = setter.GetILGenerator();
            il.Emit(OpCodes.Ldarg_0); // this
            il.Emit(OpCodes.Ldarg_1); // value
            il.Emit(OpCodes.Stfld, field); // .Foo =
            il.Emit(OpCodes.Ret);

            prop.SetCustomAttribute(new CustomAttributeBuilder(
                dataMemberCtor, new object[0],
                dataMemberProps, new object[1] { i + 1 }));
        }

        foreach (var prop in typeof(IBasicRecord).GetProperties())
        {
            var accessor = prop.GetGetMethod();
            if (accessor != null)
            {
                var args = accessor.GetParameters();
                var argTypes = Array.ConvertAll(args, a => a.ParameterType);
                var method = tb.DefineMethod(accessor.Name,
                    accessor.Attributes & ~MethodAttributes.Abstract,
                    accessor.CallingConvention, accessor.ReturnType, argTypes);
                tb.DefineMethodOverride(method, accessor);
                var il = method.GetILGenerator();
                if (args.Length == 1 && argTypes[0] == typeof(int))
                {
                    var branches = new Label[fields.Length];
                    for (int i = 0; i < fields.Length; i++)
                    {
                        branches[i] = il.DefineLabel();
                    }
                    il.Emit(OpCodes.Ldarg_1); // key
                    il.Emit(OpCodes.Switch, branches); // switch
                    // default:
                    il.ThrowException(typeof(ArgumentOutOfRangeException));
                    for (int i = 0; i < fields.Length; i++)
                    {
                        il.MarkLabel(branches[i]);
                        il.Emit(OpCodes.Ldarg_0); // this
                        il.Emit(OpCodes.Ldfld, fields[i]); // .Foo
                        if (types[i].IsValueType)
                        {
                            il.Emit(OpCodes.Box, types[i]); // (object)
                        }
                        il.Emit(OpCodes.Ret); // return
                    }
                }
                else
                {
                    il.ThrowException(typeof(NotImplementedException));
                }
            }
            accessor = prop.GetSetMethod();
            if (accessor != null)
            {
                var args = accessor.GetParameters();
                var argTypes = Array.ConvertAll(args, a => a.ParameterType);
                var method = tb.DefineMethod(accessor.Name,
                    accessor.Attributes & ~MethodAttributes.Abstract,
                    accessor.CallingConvention, accessor.ReturnType, argTypes);
                tb.DefineMethodOverride(method, accessor);
                var il = method.GetILGenerator();
                if (args.Length == 2 && argTypes[0] == typeof(int) && argTypes[1] == typeof(object))
                {
                    var branches = new Label[fields.Length];
                    for (int i = 0; i < fields.Length; i++)
                    {
                        branches[i] = il.DefineLabel();
                    }
                    il.Emit(OpCodes.Ldarg_1); // key
                    il.Emit(OpCodes.Switch, branches); // switch
                    // default:
                    il.ThrowException(typeof(ArgumentOutOfRangeException));
                    for (int i = 0; i < fields.Length; i++)
                    {
                        il.MarkLabel(branches[i]);
                        il.Emit(OpCodes.Ldarg_0); // this
                        il.Emit(OpCodes.Ldarg_2); // value
                        il.Emit(types[i].IsValueType ? OpCodes.Unbox_Any : OpCodes.Castclass, types[i]); // (SomeType)
                        il.Emit(OpCodes.Stfld, fields[i]); // .Foo =
                        il.Emit(OpCodes.Ret); // return
                    }
                }
                else
                {
                    il.ThrowException(typeof(NotImplementedException));
                }
            }
        }

        var type = tb.CreateType();
        var obj = Activator.CreateInstance(type);
        // we'll use the index (via a known interface) to set the values
        IBasicRecord rec = (IBasicRecord)obj;
        rec[0] = 123;
        rec[1] = "abc";
        rec[2] = 12F;
        rec[3] = DateTime.Now;
        for (int i = 0; i < 4; i++)
        {
            Console.WriteLine("{0} = {1}", i, rec[i]);
        }
        using (var ms = new MemoryStream())
        {
            var ser = new XmlSerializer(type);
            ser.Serialize(ms, obj);
            Console.WriteLine("XmlSerializer: {0} bytes", ms.Length);
        }
        using (var ms = new MemoryStream())
        {
            using (var writer = new StreamWriter(ms, Encoding.UTF8, 1024, true))
            {
                var ser = new JsonSerializer();
                ser.Serialize(writer, obj);
            }
            Console.WriteLine("Json.NET: {0} bytes", ms.Length);
        }
        using (var ms = new MemoryStream())
        {
            var ser = new DataContractSerializer(type);
            ser.WriteObject(ms, obj);
            Console.WriteLine("DataContractSerializer: {0} bytes", ms.Length);
        }
        using (var ms = new MemoryStream())
        {
            Serializer.NonGeneric.Serialize(ms, obj);
            Console.WriteLine("protobuf-net: {0} bytes", ms.Length);
        }
        using (var ms = new MemoryStream())
        {
            // note: NEVER do this unless you have a custom Binder; your
            // assembly WILL NOT deserialize in the next AppDomain (i.e.
            // the next time you load your app, you won't be able to load)
            // - shown only for illustration
            var bf = new BinaryFormatter();
            bf.Serialize(ms, obj);
            Console.WriteLine("BinaryFormatter: {0} bytes", ms.Length);
        }
    }
}

Output:

XmlSerializer: 246 bytes
Json.NET: 81 bytes
DataContractSerializer: 207 bytes
protobuf-net: 25 bytes
BinaryFormatter: 182 bytes

نصائح أخرى

This is actually quite a complex question. Unfortunately, to answer it fully you would have to basically write it and test it, however - I strongly suggest not looking at any on-the-fly POCO generation until you have your answer! Basically, you should ignore that step for now.

The other essential question in performance is: how fast does it need to be? The absolute first thing I would do is the absolutely simplest thing that works, and measure that. And the simplest thing that works is: load it into a DataTable and serialize that datatable (using RemotingFormat = RemotingFormat.Binary;). In 10 lines of code that will give you a line in the sand:

var dt = new DataTable();
dt.Load(yourDataReader);
//... any access tests
dt.RemotingFormat = SerializationFormat.Binary;
using (var file = File.Create(path))
{
    var bf = new BinaryFormatter();
    bf.Serialize(file, dt);
}
// ... also check deserialize, if that is perf-critical

Normally I wouldn't recommend either DataTable or BinaryFormatter, but... it doesn't seem far-fetched in this case.

Personally, I suspect you'll find that DataTable in binary-remoting-mode isn't actually terrible.

The next step is to see what else works without any huge effort. For example:

  • loading a data-source into objects is a solved problem, with tools like dapper
  • serializing a set of objects in a very efficient way is a solved problem, with tools like protobuf-net

So I would be tempted to create an illustrative class (purely to see if it is any better) along the lines of:

[DataContract]
public class Foo {
    [DataMember(Order=1)] public int Id {get;set;}
    [DataMember(Order=2)] public string Name {get;set;}
    // ... more props
    // IMPORTANT: make this representative - basically, the same data
    // that you had in the data-table

    // note also include any supporting info - any indexers and interface
    // support that your core code needs
}
[DataContract]
public class FooWrapper { // just to help in the test
     [DataMember(Order=1)] public List<Foo> Items {get;set;}
}

and do the same test (your main code would only use the indexer access, but let dapper use the .Query<Foo>(...) API for now):

var data = conn.Query<Foo>(...).ToList(); // dapper
//... any access tests, just using the indexer API
using (var file = File.Create(path))
{
    var wrapper = new FooWrapper { Items = data };
    Serializer.Serialize(file, wrapper); // protobuf-net
}
// note that you deserialize via Serializer.Deserialize<FooWrapper>(file)

The point of this is that this will give you some bounds on what is reasonable to expect in terms of what can be achieved. Feel free to use your own materializer/serializer in place of dapper/protobuf-net, but I humbly submit that these two have been heavily optimized for scenarios largely like this.

When you have a lower and upper bound, you have sensible data to answer the "is it worth it" question. Generating objects at run-time isn't massively hard, but it is more work than most people would need to do. You also want to be really careful to re-use the generated types as far as possible. Note that if you go that route, protobuf-net has a fully non-generic API, via Serializer.NonGeneric or RuntimeTypeModel.Default (all three options end up at the same core). Dapper doesn't, but I would be more than happy to add one (accepting a Type instance). In the interim, you could also use MakeGenericMethod / Invoke for that one step.

I realize I haven't directly answered "is it worth it", but that is deliberate: that cannot be answered without direct application to your scenario. Hopefully, I have instead provided some hints at how you can answer it for your scenario. I would be very interested in hearing your findings.

Only when you know that it is worth it (and with the above I would expect that to take about an hour's effort) would I go to the trouble of generating types. If you do, I recommend the use of Sigil - it will make your IL generation far less frustrating.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top