Question

How would one go about proving to management that a batch reformat of all .java files in a large code base (to place the code in compliance with the company's coding standards) is safe and will not affect functionality.

The answers would have to appease the non-technical and the technical alike.

Edit: 2010-03-12Clarification for the technical among you; reformat = white space-only changes - no "organizing imports" or "reordering of member variables, methods, etc."

Edit: 2010-03-12 Thank you for the numerous responses. I am a surprised that so many of the readers have voted up mrjoltcola's response since it is simply a statement about about being paranoid and in no way proposes an answer to my question. Moreover, there is even a comment by the same contributor reiterating the question. WizzardOfOdds seconded this viewpoint (but you may not have read all the comments to see it). -jtsampson

Edit: 2010-03-12 I will post my own answer soon, though John Skeet's answer was right on the money with the MD5 suggestion (note -g:none to turn debugging off). Though it only covered the technical aspects. -jtsampson

2010-03-15 I added my own answer below. In response to what does "safe" mean, I meant that the functionality of the Java code would not be affected. A simple study of the Java compiler shows this to be the case (with a few caveats). Thos caveats were "white space only" and were pointed out by several posters. However this is not something you want to try to explain to BizOps. My aim was to elicit "how to justify doing this" type of answers and I got several great responses.

Several people mentioned source control and the "fun" that goes along with it. I specifically did not mention that as that situation is already well understood (within my context). Beware of the "gas station" effect. See my answer below.

Was it helpful?

Solution

If it's just reformatting, then that shouldn't change the compiler output. Take a hash (MD5 should be good enough) of the build before and after the reformatting - if it's the same for every file, that clearly means it can't have altered behaviour. There's no need to run tests, etc. - if the output is byte for byte the same, it's hard to see how the tests would start failing. (Of course it might help to run the tests just for the show of it, but they're not going to prove anything that the identical binaries won't.)

EDIT: As pointed out in comments, the binaries contain line numbers. Make sure you compile with -g:none to omit debug information. That should then be okay with line numbering changes - but if you're changing names that's a more serious change, and one which could indeed be a breaking change.

I'm assuming you can reformat and rebuild without anyone caring - only checking the reformatted code back into source control should give any case for concern. I don't think Java class files have anything in them which gives a build date, etc. However, if your "formatting" changes the order of fields etc., that can have a significant effect.

OTHER TIPS

In a business environment, you have two challenges.

  1. Technical
  2. Political

From the technical perspective, reformatters are a mature technology. Combined with hashing/checksums, as long as the language isn't whitespace sensitive, you are technically safe to do this. You also want to make sure you do it during a downtime where no major forks are waiting to be merged. Real changes will be impossible to separate from reformatting, so do them separately. Merging may be very difficult for anyone working on a fork. Lastly, I would only do it after I've implemented complete test case coverage. Because of reason 2...

Politically, if you don't know how to convince management, how do you know it is safe? More specifically is it safe for you. For a senior, well-trusted developer, who is in control of the processes in a shop, it's an easier job, but for a developer working in a large, political, red-taped organization, you need to make sure you cover all your bases.

The argument I made in 2010 was a bit too clever perhaps, but parsers, reformatters, pretty printers are just software; they may have bugs triggered by your codebase, ESPECIALLY if this is C++. Without unit tests everywhere, with a large codebase, you may not be able to verify 100% that the end result is identical.

As a developer, I'm paranoid, and the idea makes me uneasy, but as long as you are using:

  1. Source control
  2. Proper test coverage

then you are OK.

However, ponder this: Management is now aware that you are mucking around in a million-line project with a "mass change". A previously undiscovered bug gets reported after your reformat. You are now chief suspect for causing this bug. Whether it is "safe" has multiple meanings. It might not be safe for you and your job.

This sounds trite, but a couple of years ago I remember something happen like this. We had a bug report come in a day after a nighttime maintenance window where I'd only done a reconfiguration and reboot of an IIS server. For several days, the story was that I must have screwed up, or deployed new code. Nobody said it directly, but I got the look from a VP that said so. We finally track it down to a bug that was already in the code, had been pushed previously, but did not show up until a QA person had changed a test case recently, but honestly, some people don't even remember that part; they just remember coming in the next day to a new bug.

EDIT: In response to jtsampson's edits. Your question wasn't about how to do it; it was "How to convince management that it is safe". Perhaps you should have asked, instead, "Is it safe? If so, how to do it, safely." My statement was pointing out the irony of your question, in that you assumed it was safe, without knowing how. I appreciate the technical side of reformatting, but I am pointing out that there is risk involved in anything non-trivial and unless you put the right person on it, it might get mucked up. Will this task detract from programmers' other tasks, sidetracking them for a couple of days? Will it conflict with some other coder's uncommitted revisions? Is the source under revision at all? Is there any embedded script that is whitespace sensitive, such as Python? Anything can have an unexpected side-effect; for our environment, it would be difficult to get a time window where there isn't someone working on a branch, and mass reformatting is going to make their merge pretty ugly. Hence my distaste for mass-reformatting, by hand or automated.

Use a pragmatic approach:

  1. Build the application.
  2. Save the application.
  3. Reformat the code.
  4. Build the application.
  5. Diff the binaries.

I would use four words.

Source control. Unit Tests.

Well, it's not at all safe and you are unlikely ever to convince them. Speaking as someone who has managed a lot of development I would never consider it in any commercial codebase on which any revenue depended. I'm not saying there aren't advantages to code formatted how you like, but the chances that your formatting will not involve some code changes is nil. That means there's a huge risk for very little gain. If you have to do it, do it piecemeal as you bug fix the code, don't do it in a big hit. It may be a good decision for you as programmers but it would be a terrible decision for them as management.

What management are we talking about here? Are they tech-savvy enough to understand the what code formatting is and how Java treats whitespace? Because if they are not, I don't think they are qualified to make such a technical decision (i.e., such questions should be delegated to someone who is responsible for the code).

But if they are or you are trying to convince your "architect" or someone similar, well, then it's about trusting a third party tool. Suggest a formatter that has a good reputation, other than that it's not much you can do, since you didn't code the formatter.

As a side track, let me share an anecdote. Our architect decided at a time to reformat all files. Out of thousands of Java files, not a single error has yet been found (and this was over half a year ago). This makes me trust Eclipse's formatter for Java source code. The benefits of this formatting were:

  • Some badly formatted classes are now easier to read.
  • Same formatting everywhere.

But it also had some negative sides:

  • A code formatter is not perfect. Sometimes manually formatted code reads better. The formatter in particular struggles with really bad code (too long lines, too many nested ifs, etc).
  • Do you have other branches of code, like an old version that occasionally needs to be patched? Because you can forget about merging between branches with different code styles (at least when using SVN).
  • You are touching all files (and sometimes almost every line) and ruining the history of all files at once. It hurts traceability.
  • There is actually a small benefit in that each developer has his own code formatting, because you start learning that formatting, and you can immediately identify the author of a piece of code

I personally think the negative outweighs the positive. It sounds like a great idea, but in reality you don't gain as much as you think. When you come across some terribly formatted code, reformat just that class or just that method and see it as a small step toward the big goal.

Do your unit tests pass after reformatting? If so, then you've sold the idea to management!

If you're mucking around with untested code, then you'll have a much harder case to make.

You want the "code in compliance with the company's coding standards" [sic] and want to convince management?

Trivial: install CheckStyle, make it part of your process, feed it your coding guidelines, and show them that the whole codebase miserably FAILS on CheckStyle.

This is a good example of the technical-business mismatch.

The technical people want to do it because it can make the code hard to read but, unless it's exceptionally bad, the real reason is that it offends the typically delicate sensibilities and aesthetics of the average programmer.

The business people want to manage risk. Risk can be undertaken if there is some benefit and there is no business benefit here unless you argue it'll be cheaper, faster and/or less risky to do future development with reformatted source code, which in all honesty is a tough sell.

Almost by definition any change has risk attached. The risk here is remote but isn't nonexistent either (from management's perspective) with almost no upside.

There is another issue to consider too: this kind of change can play havoc with source control. It becomes harder to track who changed what because the most recent change to any line will be the reformatting so you'll need to go comparing revisions, which is somewhat more tedious than a simple "blame" or "annotate" command.

Also, if you have several active branches a reformat of your code will cause havoc with your merges.

It is safe in the sense that pure formatting changes will make no difference to what's compiled, and thus no difference to the behaviour of the code at runtime.

It is worth remembering that bulk reformatting of code can lead to "fun" when dealing with source control later - if multiple colleagues have the code checked out, and one team member comes along and reformats it, then all those copies are out of date. Worse, when they update their working copies, all manner of conflicts are going to appear, because those formatting changes will affect huge portions of the code, and resolving that can be a nightmare.

Reformatting code is the same as reformatting a document in Word; it changes the layout and thus the readability, but not the contents.

If all files are formatted the same the code becomes more readable, which makes maintenance a bit easier and thus cheaper. Also code reviews can be faster and more effective.

Further, given a good formatting style, bugs can be found more easily as they cannot hide; think of if statements without curly braces and 2 statements within those imaginary braces.

Do be smart and check the code in and tag it before reformatting, so you have a state to go back to (and tell people how easy that would be), reformat and check in and tag again, without any other changes.

Answer these questions for management, and you will have gone a long way of convincing them it's a safe change?

  1. Why does good formatting matters?
  2. What changes will be made? (if you can't answer this, you don't know enough about the re-formatting to know it will be safe)
  3. Will our unit test suites prove the changes had no ill effects? (hint the answer needs to be yes)
  4. Will the existing code be tagged in the source repository so we have a quick roll back option? (hint the answer better be yes)

That about covers it.

Actually, I'd probably be on their side. Reformat units as you open them for fixes or enhancement when they will be thoroughly tested before going back into production. They should have been formatted correctly the first time but if they're in production it seems needless and reckless to reformat them only for style's sake.

Consistency is good, but "a foolish consistency is the hobgoblin of small minds".

I'm donning my manager hat...

To do it as one grand project, I wouldn't let you do it no matter the argument. I would, however, be open to longer estimates on changes because you are modifying existing files to include these formatting changes. I would require you make the formatting changes its own check-in though.

Thanks for all your responses.

My final argument to convince management; Bits of all your responses included. Thanks for the assistance.

Technical:

  • Reformat consists of white space changes (no import reordering, no member/method)
  • Reformat will use [specify tool and process]
  • Reformat will occur on [specify time within coding cycle to minimize merge impact]

Both before and after a reformat:

  • All Unit tests will pass
  • All Integration tests will pass
  • All Functional tests will pass
  • All SOAP-UI tests will pass
  • The byte code is the same (An MD5 of the .class files following javac (-g:none))

Business:

Purpose: to comply with company standards which prescribes that our source files accurately represent the logical structure of our code.

  • Reformat change vs. Code change (Word document example as above)
  • Reformat will use [general process]
  • Reformat will occur on [specify time within business cycle to minimize impact]

Pilot Test:

  • Confirmed "Format Batch" resulted in less merge conflicts then "Format as you Code". .
  • Confirmed that the executable code (4k+ .class files) remains the same. (MD5 test)
  • Confirmed functionality will not be affected (automated tests/smoke tests)
  • Confirmed formatter settings contain only white space changes.

Note: In my case a pilot test was run over 6 months by a subset of the developers using an automated tool to "Format as you code" (as prescribed by some of the answers above). While some perceived that the reformatting caused more merge conflicts, this was actually not the case.

This perception was base on the temporal coincidence of the reformat. For instance, consider the person who know nothing about cars. One day their brakes fail. To what do they attribute the cause? The gas of course. It was the last thing they put into the car (the "gas station" effect?). Clearly however, brakes and a fuel system are disparate system as are formatting and code changes. We found that improper check-ins within the context of our build process were at fault.

Last I was hoping that someone would have provided a good link to a study showing productivity gains related to common code as it is difficult to show ROI to the business. Although in my case, since it was a company standard I had "compliance" on my side. I only had to show that it was more time consuming to "Format as you Code" vs. "Batch Format"

If you are using Eclipse as your development platform, you can load all the code into the workspace locally. Demonstrate to management there are no problems by showing them the Problems tab.

Then, right click and Format each of the projects one by one - again demonstrating no problems are introduced.

You can do this on your local workstation without any harm at all to your repository.

Honestly if your management is so non-technical as to be afraid of formatting source code, then demonstrating that no problems appear on the problems tab after a format should be sufficient to show that the code is still fine.

Not to mention you will presumably have the old version tagged in source control right?

A school of thought could be to do it without asking and then be able to go "See!"

Of course if you wreck it all up then you'll get fired. You makes your choices...

Alternatively, source control (or simple backups) then you can always roll it back.

If your code has near enough 100% code coverage then I think the risk can be lowered a little bit.

However even if the management agreed that the code base is safe, I think they'd have their eyes on having to justify paying an employee to spend hours reformatting code just to adhere to a standard that (I presume) was introduced long into the development lifecycle.

We use Jalopy here at my current job. It is a pretty solid product and it produces really neat output. The most senior developer here reformatted all the code base when he migrated it from CVS to SVN, and he had to perform some tests to make sure it would work all the way from start to end, and now we have hooks to ensure that checked-in code is properly formatted.

That being said, I don't think you can convince anyone that any tool is fool (or fault) proof, because there is no such tool. If you think the benefit is worth the time and the (very small) risk, try to convince your management the biggest advantage you see in doing this. For me, the largest advantage will come if:

  • All the developers have the same formatting settings;
  • The formatting of the source code is checked at check-in by a hook in your SCM.

Because if you do the above, if your code is already formatted, when you compare revisions in your SCM you will see actual changes in the logic of the program, and not just formatting changes.

If you have good coverage of unit test, test results before and after will be enough.

Just one specific heads up: if your company policy includes alphabetic member sorting, be aware that the order of static fields does matter. So if you include an on-save or cleanup rule which does this, you might break your code.

Technically, during the first phase of compilation, lexer strips all comments and the whitespace from the source. This is long before any semantics of code is being recognized by the compiler. Therefore any whitespace or comments cannot change anything in the program logic. On the contrary, of what use would the language be and who would like to use it if adding a couple of spaces or newlines would change it semantics?

On the business side, you are probably going to use some specialized tools for that. I am sure they advertise on their websites that they work great.

Final note: if you have to convince your management of that, maybe you should look to find a way to work with smarter people?

I would ask Management what is their current basis for believing the code works - then demonstrate that the same tool (tests, documentation, little voices...) works exactly as well for the reformatted code. I would want their answer to be "tests"...

I know the previous answers are all fine, but here is another possible one: Do a CRC on the compiled version before and after a reformat. Since compiling would ignore the spaces, tabs, linefeeds, etc., then the compiled version should be identical to the original, and that would prove to those semi-technical managers that all is well.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top