Code Generators or T4 Templates, are they really evil?

https://stackoverflow.com/questions/524295

22-08-2019
|

Question

I have heard people state that Code Generators and T4 templates should not be used. The logic behind that is that if you are generating code with a generator then there is a better more efficient way to build the code through generics and templating.

While I slightly agree with this statement above, I have not really found effective ways to build templates that can say for instance instantiate themselves. In otherwords I can never do :

return new T();

Additionally, if I want to generate code based on database values I have found that using Microsoft.SqlServer.Management.SMO in conjunction with T4 templates have been wonderful at generating mass amounts of code without having to copy / paste or use resharper.

Many of the problems I have found with Generics too is that to my shock there are a lot of developers who do not understand them. When I do examine generics for a solution, there are times where it gets complicated because C# states that you cannot do something that may seem logical in my mind.

What are your thoughts? Do you prefer to build a generator, or do you prefer to use generics? Also, how far can generics go? I know a decent amount about generics, but there are traps and pitfalls that I always run into that cause me to resort to a T4 template.

What is the more proper way to handle scenarios where you need a large amount of flexibility? Oh and as a bonus to this question, what are good resources on C# and Generics?

Solution

You can do new T(); if you do this

public class Meh<T>
  where T : new()
{
  public static T CreateOne()
  {
    return new T();
  }
}

As for code-generators. I use one every day without any problems. I'm using one right now in fact :-)

Generics solve one problem, code-generators solve another. For example, creating a business model using a UML editor and then generating your classes with persistence code as I do all of the time using this tool couldn't be achieved with generics, because each persistent class is completely different.

As for a good source on generics. The best has got to be Jon Skeet's book of course! :-)

OTHER TIPS

As the originator of T4, I've had to defend this question quite a few times as you can imagine :-)

My belief is that at its best code generation is a step on the way to producing equivalent value using reusable libraries.

As many others have said, the key concept to maintain DRY is never, ever changing generated code manually, but rather preserving your ability to regenerate when the source metadata changes or you find a bug in the code generator. At that point the generated code has many of the characteristics of object code and you don't run into copy/paste type problems.

In general, it's much less effort to produce a parameterized code generator (especially with template-based systems) than it is to correctly engineer a high quality base library that gets the usage cost down to the same level, so it's a quick way to get value from consistency and remove repetition errors.

However, I still believe that the finished system would most often be improved by having less total code. If nothing else, its memory footprint would almost always be significantly smaller (although folks tend to think of generics as cost free in this regard, which they most certainly are not).

If you've realised some value using a code generator, then this often buys you some time or money or goodwill to invest in harvesting a library from the generated codebase. You can then incrementally reengineer the code generator to target the new library and hopefully generate much less code. Rinse and repeat.

One interesting counterpoint that has been made to me and that comes up in this thread is that rich, complex, parametric libraries are not the easiest thing in terms of learning curve, especially for those not deeply immersed in the platform. Sticking with code generation onto simpler basic frameworks can produce verbose code, but it can often be quite simple and easy to read.

Of course, where you have a lot of variance and extremely rich parameterization in your generator, you might just be trading off complexity an your product for complexity in your templates. This is an easy path to slide into and can make maintenance just as much of a headache - watch out for that.

Generating code isn't evil and it doesn't smell! The key is to generate the right code at the right time. I think T4 is great--I only use it occasionally, but when I do it is very helpful. To say, unconditionally, that generating code is bad is unconditionally crazy!

It seems to me code generators are fine as long as the code generation is part of your normal build process, rather than something you run once and then keep its output. I add this caveat because if just use the code generator once and discard the data that created it, you're just automatically creating a massive DRY violation and maintenance headache; whereas generating the code every time effectively means that whatever you are using to do the generating is the real source code, and the generated files are just intermediate compile stages that you should mostly ignore.

Lex and yacc are classic examples of tools of allow you to specify functionality in an efficient manner and generate efficient code from it. Trying to do their jobs by hand will lengthen your development time and probably produce less efficient and less readable code. And while you could certainly incorporate something like lex and yacc directly into your code and do their jobs at run time instead of at compile time, that would certainly add considerable complexity to your code and slow it down. If you actually need to change your specification at run time it might be worth it, but in most normal cases using lex/yacc to generate code for you at compile time is a big win.

A good percentage of what is in Visual Studio 2010 would not be possible without code generation. Entity Framework would not be possible. The simple act of dragging and dropping a control onto a form would not be possible, nor would Linq. To say that code generation should not be used is strange as so many use it without even thinking about it.

Maybe it is a bit harsh, but for me code generation smells.

That code generation is used means that there are numerous underlying common principles which may be expressed in a "Don't repeat yourself" fashion. It may take a bit longer, but it is satisfying when you end up with classes that only contain the bits that really change, based on an infrastructure that contains the mechanics.

As to Generics...no I don't have too many issues with it. The only thing that currently doesn't work is saying that

List<Animal> a = new List<Animal>();
List<object> o = a;

But even that will be possible in the next version of C#.

More code means more complexity. More complexity means more places for bugs to hide, which means longer fix cycles, which in turn means higher costs throughout the project.

Whenever possible, I prefer to minimize the amount of code to provide equivalent functionality; ideally using dynamic (programmatic) approaches rather than code generation. Reflection, attributes, aspects and generics provide lots of options for a DRY strategy, leaving generation as a last resort.

Code generation is for me a workaround for many problems found in language, frameworks, etc. They are not evil by themselves, I would say it is very very bad (i.e. evil) to release a language (C#) and framework which forces you to copy&paste (swap on properties, events triggering, lack of macros) or use magical numbers (wpf binding).

So, I cry, but I use them, because I have to.

I've used T4 for code generation and also Generics. Both are good, have their pros and cons, and are suited for different purposes.

In my case, I use T4 to generate Entities, DAL and BLL based on a database schema. However, DAL and BLL reference a mini-ORM I built, based on Generics and Reflection. So I think you can use them side by side, as long as you keep in control and keep it small and simple.

T4 generates static code, while Generics is dynamic. If you use Generics, you use Reflection which is said to be less performant than "hard-coded" solution. Of course you can cache reflection results.

Regarding "return new T();", I use Dynamic Methods like this:

public class ObjectCreateMethod
    {
    delegate object MethodInvoker();
    MethodInvoker methodHandler = null;

    public ObjectCreateMethod(Type type)
    {
        CreateMethod(type.GetConstructor(Type.EmptyTypes));
    }

    public ObjectCreateMethod(ConstructorInfo target)
    {
        CreateMethod(target);
    }

    void CreateMethod(ConstructorInfo target)
    {
        DynamicMethod dynamic = new DynamicMethod(string.Empty,
                    typeof(object),
                    new Type[0],
                    target.DeclaringType);
        ILGenerator il = dynamic.GetILGenerator();
        il.DeclareLocal(target.DeclaringType);
        il.Emit(OpCodes.Newobj, target);
        il.Emit(OpCodes.Stloc_0);
        il.Emit(OpCodes.Ldloc_0);
        il.Emit(OpCodes.Ret);

        methodHandler = (MethodInvoker)dynamic.CreateDelegate(typeof(MethodInvoker));
    }

    public object CreateInstance()
    {
        return methodHandler();
    }
}

Then, I call it like this:

ObjectCreateMethod _MetodoDinamico = new ObjectCreateMethod(info.PropertyType);
object _nuevaEntidad = _MetodoDinamico.CreateInstance();

Generics and code generation are two different things. In some cases you could use generics instead of code generation and for those I believe you should. For the other cases code generation is a powerful tool.

For all the cases where you simply need to generate code based on some data input, code generation is the way to go. The most obvious, but by no means the only example is the forms editor in Visual Studio. Here the input is the designer data and the output is the code. In this case generics is really no help at all, but it is very nice that VS simply generates the code based on the GUI layout.

Code generators could be considered a code smell that indicate a flaw or lack of functionality in the target langauge.

For example, while it has been said here that "Objects that persist can not be generalized", it would be better to think of it as "Objects in C# that automatically persist their data can not be generalized in C#", because I surely can in Python through the use of various methods.

The Python approach could, however, be emulated in static languages through the use of operator[ ](method_name as string), which either returns a functor or a string, depending on requirements. Unfortunately that solution is not always applicable, and returning a functor can be inconvenient.

The point I am making is that code generators indicate a flaw in a chosen language that are addressed by providing a more convenient specialised syntax for the specific problem at hand.

The copy/paste type of generated code (like ORMs make) can also be very useful...

You can create your database, and then having the ORM generate a copy of that database definition expressed in your favorite language.

The advantage comes when you change your original definition (the database), press compile and the ORM (if you have a good one) can re-generates your copy of the definition. Now all references to your database can be checked by the compilers type checker and your code will fail to compile when you're using tables or columns that do not exist anymore.

Think about this: If I call a method a few times in my code, am I not referring to the name I gave to this method originally? I keep repeating that name over and over... Language designers recognized this problem and came up with "Type-safety" as the solution. Not removing the copies (as DRY suggests we should do), but checking them for correctness instead.

The ORM generated code brings the same solution when referring to table and column names. Not removing the copies/references, but bringing the database definition into your (type-safe) language where you can refer to classes and properties instead. Together with the compilers type checking, this solves a similar problem in a similar way: Guarantee compile-time errors instead of runtime ones when you refer to outdated or misspelled tables (classes) or columns (properties).

quote: I have not really found effective ways to build templates that can say for instance instantiate themselves. In otherwords I can never do :

return new T();

public abstract class MehBase<TSelf, TParam1, TParam2>
    where TSelf : MehBase<TSelf, TParam1, TParam2>, new()
{
    public static TSelf CreateOne()
    {
        return new TSelf();
    }
}

public class Meh<TParam1, TParam2> : MehBase<Meh<TParam1, TParam2>, TParam1, TParam2>
{
    public void Proof()
    {
        Meh<TParam1, TParam2> instanceOfSelf1 = Meh<TParam1, TParam2>.CreateOne();
        Meh<int, string> instanceOfSelf2 = Meh<int, string>.CreateOne();
    }
}

Code generation, like generics, templates, and other such shortcuts, is a powerful tool. And as with most powerful tools, it amplifies the capaility of its user for good and for evil - they can't be separated.

So if you understand your code generator thoroughly, anticipate everything it will produce, and why, and intend it to do so for valid reasons, then have at it. But don't use it (or any of the other technique) to get you past a place where you're not to sure where you're headed, or how to get there.

Some people think that, if you get your current problem solved and some behavior implemented, you're golden. It's not always obvious how much cruft and opaqueness you leave in your trail for the next developer (which might be yourself.)

Why does being able to copy/paste really, really fast, make it any more acceptable?

That's the only justification for code generation that I can see.

Even if the generator provides all the flexibility you need, you still have to learn how to use that flexibility - which is yet another layer of learning and testing required.

And even if it runs in zero time, it still bloats the code.

I rolled my own data access class. It knows everything about connections, transactions, stored procedure parms, etc, etc, and I only had to write all the ADO.NET stuff once.

It's now been so long since I had to write (or even look at) anything with a connection object in it, that I'd be hard pressed to remember the syntax offhand.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow