Is it ok copying code from one application to another, both belonging to the same repository, to keep them independent?

https://softwareengineering.stackexchange.com/questions/416574

16-03-2021
|

Вопрос

Given a repository which contains two different applications A and B (e.g. bootloader and RTOS), is it ok to copy source code from A to B in order to avoid dependencies (include's, adding A source files to the B compilation) between them, so they stay completely independent both at build-time and runtime?

Note: In addition, let's suppose that the logic to be copied from A is private (that is, it's only meant to be used by certain internal functions in A)

Решение

It is acceptable if the copied code can change independently from the original code.

If you are copying code and every future change has to be maintained in two different code bases, you could better create a shared library. Then both applications have a dependency on the library, but not on each other.

Другие советы

In theory it's the best practice to put any significant common piece of code in a separate library that both applications use, rather than duplicating the code across both applications.

In reality I would say the choice is a trade-off between:

Avoid code duplication

Having duplicated code means there's more code that needs to be understood to understand what's going on in the applications. You also need to maintain both pieces of code, which means duplicating changes to the code. Even copying a one-line method might require both versions to be changed, whereas you can copy entire packages without ever having to change them. If some bit of code has been stable and unchanged for years, that might be a decent sign that it's not going to need changes any time soon (although that's still not a great reason to duplicate it, and extending the scope of what it's used for is a good way to find things that require changes).

If you copy the code, but then end up significantly changing it so it doesn't resemble the original all that closely, this might be a sign that you simply have two applications that do similar things and there may not be anything you can really separate out. It may also be a sign that you need to reconsider what your classes and applications do and how you structure them.
A library that makes sense out of context

If you have, for example, a general-purpose Array class, that can make sense in a library (assuming you're using a language that doesn't provide that built-in, obviously). If, on the other hand, you have some class that only makes sense given the specifics of your applications, that's not a great candidate for the library.

Generally you want a library to have some well-defined purpose or set of functionality it provides (like say to provide common data structures). If the class just does some intermediate step that requires something each of your applications would do first, that probably also shouldn't be in a separate library.

You also don't really want every change you make in either the application or the library to also require a change to the other because the two are too closely linked (but of course changing the public interface of classes in the library is going to require changes to applications using it).
The effort of maintaining a separate library

This shouldn't matter much if the library has a significant piece of code and that's distinct from your applications.

But if you just have like one small file in there, that's probably not going to make much sense as a separate library.

I would try to avoid having one application import from the other, unless you have a particularly compelling argument in favour of that.

A one-time copy is reasonable, but in my experience, if you don't set up a pattern for sharing code between builds, you will end up copying a lot more.

I used to work in a code base that used copying regularly for common code. One time I made some changes in code, but they didn't take effect. I discovered I was working in the wrong copy, so I made my changes in another place. Oops wrong copy again. That got me curious, and I found seven exact copies of that same code. Later, I did an analysis and found that a solid majority of our source files were exact duplicates of other files.

That amount of duplication didn't happen overnight, but it also took several years to fix. Common libraries you always have to think about how changes affect other builds, but having to constantly verify that you've fixed a bug in all the copies is much worse, trust me. It feels like more work up front to set up a common library, but it will save you time and hassle in the long run.

Duplication is better than premature abstraction.

I have wasted countless hours in the early part of my career/hobby of programming pulling duplicate code out into a separate class, function, or module (it's DRY! it's good!) only to have to add on more and more special handling of slightly different behavior followed by Dark Places in My Code I Dare Not Tread followed by pulling the **** thing back apart again to save my sanity.

You can definitely be too DRY.

The heuristic I mostly follow now (and it is a heuristic, not a hard-and-fast rule) is the rule of 3: if something is similar/duplicated in three places in a codebase I will think about factoring it out. This is again meant to be a guide and not a substitute for thought: you still have to exercise good judgement (same as with being DRY) but you will be less likely to shoot yourself in the foot.

This warning might seem overly dire, but I think the idea that if you have to change the same thing in more than one place you will inevitably forget (i.e. DRY is good) is already in the water supply. I don't think you have to make an argument in it's favor, so I'm giving caution against the opposite extreme.

The question to ask is whether the two pieces of code really represent the same thing or they just happen to look identical.

Can you imagine needing a change in that code for one client (client as in calling code) but not for the other?

Is the same person responsible for both clients?

If we were neighbors, technically speaking we could share a wife and children. It could be most convenient. It could also become most complicated, depending on your point of view.

So as often, there is no straight answer.

As I'm sure you know, code duplication is generally considered a code-smell, i.e. something to be avoided.

If possible you ideally want to break out the common code into a separate class library within your repository and have both of your applications reference that class library.

However, the shared library approach can then make things more difficult, because (amongst other things) you will need to consider how changes needed to the shared library by one of your applications may then impact on all the other applications that use the shared library.

To get round that you ideally want a programming environment where you can create versioned packages from that shared code (e.g. NPM in JavaScript or NuGet in .NET) so that each application can reference a specific version of your shared code. You can then make changes to that shared code safely and introduce those changes to one application at a time by changing which version of the shared package each application references.

(Those versioned packages would typically be published only within your organisation, not on public NPM/NuGet/etc.)

Let us assume the code is identical because it does the same task for the same reason, not due to happenstance. Otherwise, there is nothing to talk about anyway.

A dependency can be a heavy burden.
It increases the need for coordinating any changes, hinders tailoring to the specific use-case where appropriate, and much of it will be or become useless for any one of the projects. This is exacerbated for non-compiled code, where unused code is an especially heavy dead weight.

Managing independent duplicates is also a heavy burden.
How will you track down (or even remember you should) all of them if you fix or improve any?

In all things, balance:

Is the functionality sufficiently complex and / or commonly needed?
Put it into the appropriate common library, which need not be a SO / DLL. The overhead is worth it and the extra scrutiny is welcome.
Is it small and easy enough, or should be tailored to the use-case?
Duplication might be a code smell, but that doesn't mean it isn't the smart choice.

Take the time to get it right.
Remember YAGNI and refactoring, the fewer depend on some interface, the easier it is to change, move, remove, or replace.
Retracting an interface is much more costly than promoting one, and having to keep it around is a drain.

It has been my experience that the ideal solution to this specific problem is to create a static library in the same repository as your other two apps.

This resolves MOST OF the awkwardness of maintaining library versions, and ensures the code does not diverge.

This works in scenarios in which the systems in question are tightly coupled by nature and the relative likelihood that they will encounter a different build of their counterpart in the wild is quite low.

If the systems are loosely coupled and/or can interact with different builds of their counterparts with moderate frequency, this could be a bad plan, as one becomes less incentivized to think about (and test) BW compat scenarios. This too, can be managed, but requires vigilance, and may be better supported by a typical 'versioned software package' approach.

Лицензировано под: CC-BY-SA с атрибуция

Не связан с softwareengineering.stackexchange