How does encapsulation actually work?

https://softwareengineering.stackexchange.com/questions/404105

06-03-2021
|

Question

I made the following diagram to show a typical separation of concerns as typically taught -

Here ClassA indirectly uses ClassB via the ISomeInterface, of course ensuring it doesn't know ClassB exists, only the methods within the interface, which ClassB implements. All the information I can find on this separation of concerns ends here, and nowhere can I find out how classA can actually use the interface without coupling itself to ClassB.

You can't of course instantiate nothing but an interface, an interface has no implementation or functionality itself. So how does ClassA use the interface?

There are only two ways currently that come to mind -

1) ClassA does the following:

ISomeInterface obj = new ClassB();

Here we can make sure we're not calling any members of ClassB directly, only interface members. The problem though is that ClassB has leaked through to ClassA via the instantiation.

2) ClassA relies upon the interface only, delegating this responsibility of passing the classB object elsewhere, via having the following constructor:

class ClassA {
        ISomeInterface obj;

        ClassA(ISomeInterface obj) {
            this.obj = obj;
        }
    }

This of course completely decouples ClassA from ClassB, however all it does is "pass the buck" elsewhere since someone, somewhere, must instantiate an implementation of ISomeInterface (such as ClassB) and pass it as an object to ClassA.

All the tutorials and explanations I can find leave out this last crucial detail. Who exactly is responsible for doing this last crucial thing? It has to happen somewhere. And is the thing that does this now coupling itself to both ClassA & ClassB?

Solution

Coupling

ClassA relies upon the interface only, delegating this responsibility of passing the classB object elsewhere

This is the idea. If you are separating ClassA from ClassB by the use of an interface ISomeInterface...

ensuring (ClassA) doesn't know ClassB

Then you do not want ClassA to instantiate ClassB. Instead it must recieve an object typed by the interface (for example, it can be received in the constructor).

How does this help encapsulation? You do not get encapsulation for free. However, given that ClassA will only use ClassB via its interface, you should define interfaces which are sufficient for that use. Ideally you would also remove any part of the interface that is not used. Then you will know exactly what ClassB needs to expose. You still need to design a good interface.

however all it does is "pass the buck" elsewhere since someone, somewhere, must instantiate an implementation of ISomeInterface (such as ClassB) and pass it as an object to ClassA.

Yes.

It has to happen somewhere. And is the thing that does this now coupling itself to both ClassA & ClassB?

Yes.

Coupling is unavoidable. You are not trying to make it zero. You are trying to keep it low. Think of it as a cost you have to pay... too much coupling means that maintenance is hard. However, maintenance will never be a zero effort task.

Factories

I will continue on the assumption that we do dependency injection on the constructor.

You want to have a single source of truth of how to get a ClassA object. That is, you do not want to call the constructor passing an object of ClassB every time. You want to encapsulate that and reuse it.

If in the future you need to replace ClassB by ClassC, then there will be a single place where to change that. If there is some logic that picks ClassB or ClassC, there is a single place where that logic will be. If the instances need to be pooled, you can do that there too.

Thus, you will have a piece of code that has as responsibility to give you an object of ClassA bound with the correct implementation of ISomeInterface. And that is the only responsibility of that code. We could refer to that piece of code as a factory.

Note: Creating a factory for each class should not be a goal. You only need the ones you need.

Depending on what you are doing, it could make sense that in some places you want ClassA created with ClassB and in others you want ClassA created with ClassC. In fact, this could depend on external input.

For another example, let us say we have a video game, where there are some characters. Sometimes they are controlled by the local user, sometimes they are controlled by a player connected through the network, sometimes they are controlled by AI. In this case, we would have three valid implementations, and they could all co-exist.

For another example, it could be that we have a video game that can save progress to local storage, or to a cloud service that allows you to continue playing on a different machine. That is, we have two implementations to save progress, and which one we use depends on user input.

Where to put factories?

Yes, you can apply complex patterns to handle your factories. In fact, I'm saying the factory is a piece of code to keep it vague. However, let me point this out: If you are making a library... unless your goal is to make a library dedicated to factory patterns, let the developer of the application decide what factory patterns to use, by not providing any.

Exactly where you want to put your factory code will depend on what kind of application you have. For simple console and desktop projects, you can do all the dependency injecting in main. If you are doing something more complex (a network server, for example), you might want some factory solution. Usually, you want to use it after deciding what code will handle input, and after the dependency injection is done, the code does not need to know what factory solution you used.

By factory solution I mean service locators, IoC containers… factory methods, abstract factories… all that stuff. Whatever patterns you pick. If you pick a library that does it… guess what? It is a dependency! Muahahaha. So, yeah, after the dependency injection is done, you do not want any code to be aware of it.

Architecture

Abstracting things away is good™. In fact, it happens naturally by the single responsibility principle. By the way, I think you mean “single responsibility principle” instead of separation of concerns.

I believe that when they tell you that a layer can only interact with lower layers via an interface, they not abstract interface types, but an interface as in API.

However, we could be mixing all the layer talk with the concept of abstracting external systems…

The software engineer needs to make an executive decision about the scope of the project, which decides what is part of the system and what is external.

Draw a line. An architectural line. On the left of the line you code that deals with external systems. Which includes writing adapters which abstract the external system. That code must be interchangeable, allowing you to easily swap it in the future if needs be.

Note: I am not suggesting to wrap every dependency in adapters. Only external systems. So, no, I'm saying that you need to make an adapter for every library you use, nor the ridiculous idea of isolating from the runtime. While it is true that those would have their own reasons to change, you follow the single responsibility principle and that is enough. This is about external systems as in IO. Also, I'm saying that you decide what is external.

Therefore, you do not want the code on the right to depend on code on the left (which could have to change because of external reasons). However, the code on the left can depend on code on the right (which is under your control).

Thus, you will have the adapters (which are on the left) implement an interface (that exists on the right), and the rest of the code on the right of the architectural line will depend on the interface only…

However, somebody has to instantiate the adapter. Where does that code go? It goes to the left of the architectural line. Because, remember, the code to the right cannot depend on the code on the left. Thus the code that instantiates the adapter (which is to the left), must be to the left.

Now, surprise: the operating system is an external system. main is a method that handles an event from the operating system (namely the start of the program). And since the operating system is external, main goes on the left of the line, and thus it can instantiate the adapters, and it can call code on the right. Well, actually it would call the factory.

Similarly, if you have a form and event handlers the event handlers are on the left. Or if you have web server, the router is on the left. You want that code (main, event handlers, router) to call the factory, inject the dependencies, and let the execution flow to the right.

On the flip side. Your code on the right might want to call the external systems. Remember that this is possible via the interfaces. I want to point out that there is another way to do this. Since the flow of execution starts on the left, it is often possible to let the right return, then the left can execute more code.

What I’m describing here? The idea of the architectural line comes from Clean Code.

I am taking inspiration from the idea of “functional core, imperative shell”: pure functions can only call pure functions. However impure functions can call pure and impure functions (pure to the right, impure to the left).

Bonus chatter: You might be aware that async code has a tendency to propagate. Because async methods can call synchronous methods and await other async methods. However synchronous cannot await async methods. However, you can use the architectural line to keep it on the left (synchronous to the right, async to the left).

Why do I say left and right instead of talking of higher and lower layers? Because I would treat both the database and the user interface as external systems. However, classic tier architecture will say that the database is a lower layer and the UI is a higher layer.

Also, the communication between internal layers does not have to be held to the same standards. Providing a concrete facade or allowing a “higher” layer to instantiate types of the lower one is acceptable inside the system. Although, sometimes you want to draw more lines.

OTHER TIPS

Your confusion is probably caused by focusing on meaningless class and interface names, with no real usage scenario behind it. So better let us make a concrete example (I prefer C#, but it is not really different in other languages like Java).

The IComparable interface in the .NET framework looks (simplified) like this:

interface IComparable
{
  int CompareTo (object obj); 
}

The interface belongs to the System assembly.

There are classes like ArrayList (also part of the .NET Framework, inside System.Collections) which provide some functionality for objects implementing this interface, foremost Sort. ArrayList corresponds to your ClassA.

So obviously, since these classes were written years ago, they don't know anything about classes which are newly created today.

Now you write a new program with a class Balloon which implements IComparable (for example, by comparing balloons by their volume). This is ClassB in your question. So there is your program which

creates one instance of ArrayList and several instances of Balloon objects
puts the Balloon objects into ArrayList
and calls the Sort method, which just works, though it has never seen your Balloon class before.

So yes, the "buck" of deriving and instantiating a class B is passed to someone else - to the code utilizing the interface and the existing class A - and this makes perfectly sense.

A final note: many questions about interfaces, the strategy pattern, the template method pattern or the Open-Closed-principle can be more easily understood when one thinks in terms of separated libraries, where one lib is provided by a 3rd party vendor and cannot be changed easily by a user of that lib.

This is severely overthought. Look into how Go does it, as an example. That approach removes all the unnecessary cruft. Too much stuff has stacked up around this idea over the years, and most of these ideas can be torched.

An interface is nothing more than a promise about which methods are guaranteed to exist; so that the compiler can reject a method call that will fail to exist. In some languages you must declare that you implement the interface on the class. In other languages, such as Go, if you pass in a struct that HAS the interface methods, then it's valid to assign the struct to that interface type. This choice means that you can declare one or two method interfaces, and pass in legacy structs that you don't want to modify.
In Go, you just have structures. They can have methods attached to them. You can make some members private and others public; but that's kind of unnecessary, and sometimes counter-productive when you need to get in and fix something. If you just use structs with all public members, you can hide which struct it is (and therefore all struct members) by just assigning the struct to an interface with the minimum number of methods you actually need.

Trying to encapsulate data structures themselves is extremely counter-productive when these structures need to match an on-disk or on-network data structure.

If you want to keep code "encapsulated", then recognize that you are passing in a struct that might define a dozen methods, but you only need one or two of them for the task at hand. So just create an interface with those two methods. Now you are only coupled to those two methods bound to a reference.

If you think about something like Java IO classes and interfaces, they often have interfaces that just have way too many methods on them; and it's inconvenient because you need to modify the class to declare the interface. It's much better for a third-party library library to come along and just make its own interface and assign your struct instance to it.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange