Is Refactoring by Compilation Errors Bad?

https://stackoverflow.com/questions/404717

03-07-2019
|

Question

I have been used to do some refactorings by introducing compilation errors. For example, if I want to remove a field from my class and make it a parameter to some methods, I usually remove the field first, which causes a compilation error for the class. Then I would introduce the parameter to my methods, which would break callers. And so on. This usually gave me a sense of security. I haven't actually read any books (yet) about refactoring, but I used to think this is a relatively safe way of doing it. But I wonder, is it really safe? Or is it a bad way of doing things?

Solution

This is a common and useful technique for statically compiled languages. The general version of what you are doing could be stated as follows:

When you are making a change to a module that could invalidate some uses in clients of that module, make the initial change in such a way as to cause a compile-time error.

There are a variety of corollaries:

If the meaning of a method, function, or procedure changes, and the type does not also change, then change the name. (When you will have scrutinized and fixed all the uses you will probably change the name back.)
If you add a new case to a datatype or a new literal to an enumeration, change the names of all the existing datatype constructors or enumeration literals. (Or, if you are lucky enough to have a compiler that checks to see whether case analysis is exhaustive, there are easier ways.)
If you're working in a language with overloading, don't just change one variant or add a new variant. You risk having overloading resolve silently in a different way. If you use overloading, it's quite hard to get the compiler to work for you in the way you hope. The only way I know of to handle overloading is to reason globally about all uses. If your IDE won't help you, you must change the names of all overloaded variants. Unpleasant.

What you are really doing is using the compiler to help you examine all the places in the code that you might need to change.

OTHER TIPS

I never rely on simple compilation when refactoring, the code can compile but bugs might have been introduced.

I think that only writing some unit tests for the methods or classes that you want to refactor will be the best, then by running the test after the refactoring you'll be sure that no bugs were introduced.

I'm not saying go to Test Driven Development, just write the unit tests to get the necessary confidence that you need to refactor.

I don't see any problem with it. It is safe, and as long as you're not committing changes before you compile, it has no long term effect. Also, Resharper and VS have tools that make the process a bit easier for you.

You use a similar process in the other direction in TDD - you write code that may not have the methods defined, which causes it to not compile, then you write enough code to compile (then pass tests, and so on...)

When you are ready to read books on the subject, I recommend Michael Feather's "Working Effectively with Legacy Code". (Added by non-author: also Fowler's classic book "Refactoring" - and the Refactoring web site may be useful.)

He talks about identifiying the characteristics of code you are working before you make a change and doing what he calls scratch refactoring. That is refectoring to find characteristics of the code and then throwing the results away.

What you are doing is using the compiler as an auto-test. It will test that your code compiles but not if the behaviour has changed due to your refactoring or if there were any side affects.

Consider this

class myClass {
     void megaMethod() 
     {
         int x,y,z;
         //lots of lines of code
         z = mysideEffect(x)+y;
         //lots more lines of code 
         a = b + c;
     }
}

you could refactor out the addtion

class myClass {
     void megaMethod() 
     {
         int a,b,c,x,y,z;
         //lots of lines of code
         z = addition(x,y);
         //lots more lines of code
         a = addition(b,c);  
     }

     int addition(int a, b)
     {
          return mysideaffect(a)+b;
     }
}

and this would work but the second additon would be wrong as it invoked the method. Further tests would be needed other than just compilation.

It's easy enough to think about an examples where refactoring by compiler errors fails silently and produces unintended results.
A few cases that come to mind: (I'll assume we're talking about C++)

Changing arguments to a function where other overloads exist with default parameters. After the refactoring the best match for the arguments might not be what you expect.
Classes that have cast operators or non explicit single argument constructors. Changing, adding, removing or changing arguments to any of these can change the best matches which get called, depending on the constellation involved.
Changing a virtual function and without changing the base class (or changing the base class differently) will result in calls being directed to the base class.

relying on compiler errors should be used only if you are absolutely certain the compiler will catch every single change that needs to be made. I almost always tend to be suspicions about this.

I'd just like to add to all the wisdom around here that there is one more case where this might not be safe. Reflection. This affects environments like .NET and Java (and others too, of course). Your code would compile, but there would still be runtime errors when Reflection would try access non-existent variables. For example, this could be pretty common if you use an ORM like Hibernate and forget to update your mapping XML files.

It might be a bit safer to search the whole code files for the specific variable/method name. Of course, it might bring up a lot of false positives, so it's not a universal solution; and you might use string concatenation in your reflection too which would also make this useless. But it's at least a little step closer to safety.

I don't think that there is a 100% foolproof method though, apart from going through all the code manually.

It's a pretty common way I think, as it finds all references to that specific thing. However modern IDEs such as Visual Studio have Find All References feature which makes this unnecessary.

However, there are some disadvantages to this approach. For large projects, it might take a long time to compile the application. Also, don't do this for a long time (I mean, put things back working as soon as possible) and don't do this for more than one thing at a time as you might forget the correct way you patched things up at the first time.

It's a way and there can be no definite statement about whether it is safe or unsafe without knowing what the code you are refactoring looks like and the choices you make.

If it is working for you then there's no reason to change just for change sake but when you have the time having a read though the resources here might give you new ideas that you might want to explore over time.

http://www.refactoring.com/

Another option you could consider, if you're using one of the dotnet lanuages, is to flag the "old" method with the Obsolete attribute, which will introduce all the compiler warnings, but still leave the code callable should there be code beyond you control (if you're writing an API, or if you don't use option strict in VB.Net, for example). You can than merrily refactor, having the obsolete version call the new one; as an example:

    public string Username
    {
        get
        {
            return this.userField;
        }
        set
        {
            this.userField = value;
        }
    }

    public int Login()
    {
        /* do stuff */
    }

Becomes:

    [ObsoleteAttribute()]
    public string Username
    {
        get
        {
            return this.userField;
        }
        set
        {
            this.userField = value;
        }
    }

    [ObsoleteAttribute("Replaced by Login(username, password)")]
    public int Login()
    {
        Login(Username, Pasword);
    }

    public int Login(string username, string password)
    {
        /* do stuff */
    }

That's how I tend to do it, anyway...

This is a common approach, but results may vary depending on whether your language is static or dynamic.

In a statically typed language this approach makes some sense since any discrepancies you introduce will be caught at compile time. However, dynamic languages will often only encounter these issues at runtime. These issues wouldn't be caught by the compiler, but rather by your test suite; assuming you wrote one.

I get the impression that you're working with a static language like C# or Java, so continue with this approach until you encounter some kind of major issue that says you should do otherwise.

I do usual refactorings but still do refactorings by introducing compiler errors. I do them usually when changes are not so simple and when this refactoring is not real refactoring (I am changing functionality). Those compiler errors are giving me spots that I need to take a look and make some more complicate change than name or parameter change.

This sounds similar to an absolutely standard method used in Test-Driven Development: write the test referring to a nonexistent class, so that the first step to make the test pass is to add the class, then the methods, and so on. See Beck's book for exhaustive Java examples.

Your method for refactoring sounds dangerous because you don't have any tests for safety (or at least you don't mention that you have any). You might create compiling code that doesn't actually do what you want, or breaks other parts of your application.

I'd suggest you add a simple rule to your practise: make noncompiling changes only in unit test code. That way you are sure to have at least a local test for each modification, and you are recording the intent of the modification in your test before making it.

By the way, Eclipse makes this "fail, stub, write" method absurdly easy in Java: each nonexistent object is marked for you, and Ctrl-1 plus a menu choice tells Eclipse to write a (compilable) stub for you! I'd be interested to know if other languages and IDEs provide similar support.

It's "safe" in the sense that in a sufficiently compile-time-checked language, it forces you to update all live references to the thing you've changed.

It can still go wrong if you have conditionally-compiled code, for instance if you've used the C/C++ preprocessor. So make sure that you rebuild in all possible configurations and on all platforms, if applicable.

It doesn't remove the need to test your changes. If you've added a parameter to a function, then the compiler can't tell you whether the value you supplied when you updated each call site for that function, was the right value. If you've removed a parameter, you can still make mistakes, for example, change:

void foo(int a, int b);

void foo(int a);

Then change the call from:

foo(1,2);

to:

foo(2);

That compiles fine, but it's wrong.

Personally, I do use compilation (and link) failures as a way of searching code for live references to the function I'm changing. But you have to bear in mind that this is just a labour-saving device. It doesn't guarantee that the resulting code is correct.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow